incapsula package¶
Submodules¶
incapsula.errors module¶
-
exception
incapsula.errors.
IncapBlocked
(response, *args)¶ Bases:
ValueError
Base exception for exceptions in this module.
Parameters: - response (requests.Response) – The response which was being processed when this error was raised.
- *args –
Additional arguments to pass to
ValueError
.
-
exception
incapsula.errors.
MaxRetriesExceeded
(response, *args)¶ Bases:
incapsula.errors.IncapBlocked
Raised when the number attempts to bypass incapsula has exceeded the amount specified.
Parameters: - response (requests.Response) – The response which was being processed when this error was raised.
- *args –
Additional arguments to pass to
ValueError
.
-
exception
incapsula.errors.
RecaptchaBlocked
(response, *args)¶ Bases:
incapsula.errors.IncapBlocked
Raised when re-captcha is encountered.
Parameters: - response (requests.Response) – The response which contains the re-captcha.
- *args –
Additional arguments to pass to
ValueError
.
incapsula.parsers module¶
-
class
incapsula.parsers.
IframeResourceParser
(response)¶ Bases:
incapsula.parsers.ResourceParser
Parser object to obtain the contents of the incapsula iframe.
Parameters: response (requests.Response) – The response of the request sent to the incapsula iframe url. -
default_find_recaptcha_args
= [(‘form’, {‘id’: ‘captcha-form’}), (‘div’, {‘class’: ‘g-recaptcha’})]¶
-
extra_find_recaptcha_args
= []¶
-
is_blocked
()¶ Determine whether the iframe contents is a google recaptcha.
This is determined by simply iterating over the combined results of default_find_recaptcha_args and extra_find_recaptcha_args then seeing if the element is found in the document.
Returns: True if the iframe contains a google recaptcha. Return type: bool
-
recaptcha_element
¶ Recaptcha element in the document.
Return type: bs4.element.Tag
-
-
class
incapsula.parsers.
ResourceParser
(response)¶ Bases:
object
Superclass for all other parser objects.
Parameters: response (requests.Response) – Response from GET request. -
is_blocked
()¶ Override this method to determine whether or not the resource is blocked.
Note
If this class is passed into
IncapSession
as theresource_parser
parameter then this method will be used to determine whether to attempt to bypass incapsula or raise aMaxRetriesExceeded
error on too many retries.Note
If this class is passed into
IncapSession
as theiframe_parser
parameter then this method will be used to determine whether to raise aRecaptchaBlocked
error when a re-captcha is encountered.Returns: True if resource is blocked otherwise False
-
-
class
incapsula.parsers.
WebsiteResourceParser
(response)¶ Bases:
incapsula.parsers.ResourceParser
Parser object to extract the robots meta element, incapsula iframe element, and the incapsula iframe url.
Parameters: response (requests.Response) – The response of the request sent to the targeted host. -
default_find_iframe_args
= [(‘iframe’, {‘src’: re.compile(‘^/_Incapsula_Resource.*’)}), (‘iframe’, {‘src’: re.compile(‘^//content\.incapsula\.com.*’)})]¶
-
extra_find_iframe_args
= []¶
-
incapsula_iframe
¶ The iframe which contains the javascript code that runs on browser load.
Return type: bs4.element.Tag
-
incapsula_iframe_url
¶ The src attribute value of the incapsula iframe.
Return type: str
-
is_blocked
()¶ Determine whether the resource is blocked by incapsula or not.
If the resource has the <meta name=”ROBOTS”> tag and the incapsula IFrame then we can assume the resource is blocked.
Returns: True if the robots meta tag and the incapsula iframe are both found in the document. Return type: bool
-
robots_meta
¶ The meta robots tag which is so commonly found in incapsula blocked resources.
Return type: bs4.element.Tag
-
incapsula.session module¶
-
class
incapsula.session.
IncapSession
(max_retries=3, user_agent=None, cookie_domain=”, resource_parser=<class ‘incapsula.parsers.WebsiteResourceParser’>, iframe_parser=<class ‘incapsula.parsers.IframeResourceParser’>)¶ Bases:
requests.sessions.Session
Session object to bypass sites which are guarded by incapsula.
Parameters: - max_retries – The number of times to attempt to get the incapsula resource before
raising a
MaxRetriesExceeded
error. Set this to None to never give up. - user_agent – Change the default user agent when sending requests.
- cookie_domain – Use this param to change the domain which is set in the cookie. Sometimes the domain set for the cookie isn’t the same as the actual host. i.e. .domain.com instead of www.domain.com.
- resource_parser –
ResourceParser
to use when checking whether the website served back a page which is blocked by incapsula. Default:WebsiteResourceParser
. - iframe_parser –
ResourceParser
class (not instance) to use when checking whether the iframe contains a captcha. Default:IframeResourceParser
.
-
crack
(resp, org=None, tries=0)¶ If the response is blocked by incapsula then set the necessary cookies and attempt to bypass it.
Parameters: - resp – Response to check.
- org – Original response. Used only when called recursively.
- tries – Number of attempts. Used only when called recursively.
Returns:
-
get
(url, bypass_crack=False, **kwargs)¶ Override
Session
.:func:getParameters: - url – URL for the new
Request
object. - bypass_crack – Use when sending a request that you dont want to go through the incapsula crack.
- kwargs – Optional arguments that
request
takes. Used in this class so when sending a get request from this instance, we dont end up creating an infinate loop by calling .get() then .crack() which calls .get() and repeat x infinity. Also any requests made to get incapsula resources don’t need to be cracked.
Return type: requests.Response
- url – URL for the new
-
get_incapsula_resource_url
(scheme, host)¶ Override this method to change the GET request after the cookies are set.
After the cookies are set, there is a GET request which must get sent to validate the session. Override this method to return a different url to send the GET request to. This method is more of a future proofing measure than anything.
Parameters: - scheme – ‘http’ or ‘https’.
- host – The host of the incapsula resource url. e.x. ‘www.example.com’.
- max_retries – The number of times to attempt to get the incapsula resource before
raising a
-
incapsula.session.
simple_digest
(s)¶ Create a sum of the ordinal values of the characters passed in from s.
Parameters: s – The string to calculate the digest from. Returns: Sum of ordinal values converted to a string.
-
incapsula.session.
test
()¶ Quote each value in the tuple list and return a comma delimited string of the parameters.
This method is a shortened version of incapsulas test method. What the original method does is check for specific plugins in your browser and set a cookie based on which extensions you have installed. The list of the values is taken from my own browser after running the test method so they are all valid.
This is just more of a shortcut method instead of trying to reverse engineer the entire code that they had. :return: