incapsula package

Submodules

incapsula.errors module

exception incapsula.errors.IncapBlocked(response, *args)

Bases: ValueError

Base exception for exceptions in this module.

Parameters:
  • response (requests.Response) – The response which was being processed when this error was raised.
  • *args

    Additional arguments to pass to ValueError.

exception incapsula.errors.MaxRetriesExceeded(response, *args)

Bases: incapsula.errors.IncapBlocked

Raised when the number attempts to bypass incapsula has exceeded the amount specified.

Parameters:
  • response (requests.Response) – The response which was being processed when this error was raised.
  • *args

    Additional arguments to pass to ValueError.

exception incapsula.errors.RecaptchaBlocked(response, *args)

Bases: incapsula.errors.IncapBlocked

Raised when re-captcha is encountered.

Parameters:
  • response (requests.Response) – The response which contains the re-captcha.
  • *args

    Additional arguments to pass to ValueError.

incapsula.parsers module

class incapsula.parsers.IframeResourceParser(response)

Bases: incapsula.parsers.ResourceParser

Parser object to obtain the contents of the incapsula iframe.

Parameters:response (requests.Response) – The response of the request sent to the incapsula iframe url.
default_find_recaptcha_args = [(‘form’, {‘id’: ‘captcha-form’}), (‘div’, {‘class’: ‘g-recaptcha’})]
extra_find_recaptcha_args = []
is_blocked()

Determine whether the iframe contents is a google recaptcha.

This is determined by simply iterating over the combined results of default_find_recaptcha_args and extra_find_recaptcha_args then seeing if the element is found in the document.

Returns:True if the iframe contains a google recaptcha.
Return type:bool
recaptcha_element

Recaptcha element in the document.

Return type:bs4.element.Tag
class incapsula.parsers.ResourceParser(response)

Bases: object

Superclass for all other parser objects.

Parameters:response (requests.Response) – Response from GET request.
is_blocked()

Override this method to determine whether or not the resource is blocked.

Note

If this class is passed into IncapSession as the resource_parser parameter then this method will be used to determine whether to attempt to bypass incapsula or raise a MaxRetriesExceeded error on too many retries.

Note

If this class is passed into IncapSession as the iframe_parser parameter then this method will be used to determine whether to raise a RecaptchaBlocked error when a re-captcha is encountered.

Returns:True if resource is blocked otherwise False
class incapsula.parsers.WebsiteResourceParser(response)

Bases: incapsula.parsers.ResourceParser

Parser object to extract the robots meta element, incapsula iframe element, and the incapsula iframe url.

Parameters:response (requests.Response) – The response of the request sent to the targeted host.
default_find_iframe_args = [(‘iframe’, {‘src’: re.compile(‘^/_Incapsula_Resource.*’)}), (‘iframe’, {‘src’: re.compile(‘^//content\.incapsula\.com.*’)})]
extra_find_iframe_args = []
incapsula_iframe

The iframe which contains the javascript code that runs on browser load.

Return type:bs4.element.Tag
incapsula_iframe_url

The src attribute value of the incapsula iframe.

Return type:str
is_blocked()

Determine whether the resource is blocked by incapsula or not.

If the resource has the <meta name=”ROBOTS”> tag and the incapsula IFrame then we can assume the resource is blocked.

Returns:True if the robots meta tag and the incapsula iframe are both found in the document.
Return type:bool
robots_meta

The meta robots tag which is so commonly found in incapsula blocked resources.

Return type:bs4.element.Tag

incapsula.session module

class incapsula.session.IncapSession(max_retries=3, user_agent=None, cookie_domain=”, resource_parser=<class ‘incapsula.parsers.WebsiteResourceParser’>, iframe_parser=<class ‘incapsula.parsers.IframeResourceParser’>)

Bases: requests.sessions.Session

Session object to bypass sites which are guarded by incapsula.

Parameters:
  • max_retries – The number of times to attempt to get the incapsula resource before raising a MaxRetriesExceeded error. Set this to None to never give up.
  • user_agent – Change the default user agent when sending requests.
  • cookie_domain – Use this param to change the domain which is set in the cookie. Sometimes the domain set for the cookie isn’t the same as the actual host. i.e. .domain.com instead of www.domain.com.
  • resource_parserResourceParser to use when checking whether the website served back a page which is blocked by incapsula. Default: WebsiteResourceParser.
  • iframe_parserResourceParser class (not instance) to use when checking whether the iframe contains a captcha. Default: IframeResourceParser.
crack(resp, org=None, tries=0)

If the response is blocked by incapsula then set the necessary cookies and attempt to bypass it.

Parameters:
  • resp – Response to check.
  • org – Original response. Used only when called recursively.
  • tries – Number of attempts. Used only when called recursively.
Returns:

get(url, bypass_crack=False, **kwargs)

Override Session.:func:get

Parameters:
  • url – URL for the new Request object.
  • bypass_crack – Use when sending a request that you dont want to go through the incapsula crack.
  • kwargs – Optional arguments that request takes. Used in this class so when sending a get request from this instance, we dont end up creating an infinate loop by calling .get() then .crack() which calls .get() and repeat x infinity. Also any requests made to get incapsula resources don’t need to be cracked.
Return type:

requests.Response

get_incapsula_resource_url(scheme, host)

Override this method to change the GET request after the cookies are set.

After the cookies are set, there is a GET request which must get sent to validate the session. Override this method to return a different url to send the GET request to. This method is more of a future proofing measure than anything.

Parameters:
  • scheme – ‘http’ or ‘https’.
  • host – The host of the incapsula resource url. e.x. ‘www.example.com’.
incapsula.session.simple_digest(s)

Create a sum of the ordinal values of the characters passed in from s.

Parameters:s – The string to calculate the digest from.
Returns:Sum of ordinal values converted to a string.
incapsula.session.test()

Quote each value in the tuple list and return a comma delimited string of the parameters.

This method is a shortened version of incapsulas test method. What the original method does is check for specific plugins in your browser and set a cookie based on which extensions you have installed. The list of the values is taken from my own browser after running the test method so they are all valid.

This is just more of a shortcut method instead of trying to reverse engineer the entire code that they had. :return:

Module contents