title: GSoC 2014 - 2nd month
author: depierre
published: 2014-06-09
categories: Gsoc 2014, Python
keywords: gsoc, 2014, project, owasp, owtf, security, pentest, python, javascript, js, jquery, library


It is time for the second monthly post about my GSoC project which is to
implement an automated ranking system for
[OWASP](https://www.owasp.org/index.php/Main_Page) -
[OWTF](https://www.owasp.org/index.php/OWASP_OWTF).

Today I am going to show the last modification I have done on the classful
plugin system. Then I present my new library that aims to be *the* solution for
my project: [PTP](https://github.com/owtf/ptp) and I finish with the new OWTF's
plugin report template that I have created.

<!---summary-->

# Classful Plugin System, the end of the road

During the first month of the GSoC, which aims for the students and their
mentors to know each other, I worked on a more efficient way to declare new
attributes for a plugin.  
During my work, I took a deeper look on how the current system was working and
I thought I could implement a better one.

Long story short, I ended up writing a completely different system, a classful
one.  
Each class has its own goal like running a shell command (i.e. an
`ActivePlugin`) or retrieve some data from the database (i.e. a
`PassivePlugin`).

In the previous post, I shown some snippets of the current work and I think
that now the system is complete.

## What changed

I simplified the previous hierarchy but merging the `AbstractRunCommandPlugin`
and the `ActivePlugin`. I thought that only an active plugin would need to run
a shell command (like running [Arachni](http://www.arachni-scanner.com/) for
instance) hence the `AbstractRunCommandPlugin` was an extra layer with no
actual meaning.

I also finished implementing the other classes like `PassivePlugin`,
`SemiPassivePlugin`, `GrepPlugin` and `ExternalPlugin` classes.

## Automatically retrieve a class from a module

After declaring all of the classes, I was able to rewrite the plugins and then
I had to modify OWTF in order to run the new plugins.

I already have updated OWTF plugin manager in order to dynamically loads each
plugin using the [`imp`](https://docs.python.org/2/library/imp.html) module.
Now, I had to modify it in order automatically retrieve the new plugin class
declaration and it was a little bit tricky.

Looking over the Internet, I have found several topics describing the process
(or at least a similar process). I have found the
[`inspect`](https://docs.python.org/2/library/inspect.html) module for
instance, some workaround using the
[`dir()`](https://docs.python.org/2/library/functions.html#dir) function.

After a couple of tests, I found that using a simple `__dict__` was enough for
my needs. Then, using
[`issubclass`](https://docs.python.org/2/library/functions.html#issubclass)
allowed me to find the correct object I was looking for.

Let us take an example based on the [following
plugin](https://github.com/DePierre/owtf/blob/classful_plugin/plugins/web/active/Skipfish_Unauthenticated%40OWTF-WVS-006.py):

    :::python
    from framework.plugin.plugins import ActivePlugin


    class SkipfishUnauthPlugin(ActivePlugin):
        """Active Vulnerability Scanning without credentials via Skipfish."""

        RESOURCES = 'Skipfish_Unauth'

OWTF's classful plugin system will retrieve the `SkipfishUnauthPlugin` class
using the following snippet:

    :::python
    from framework.plugin.plugins import AbstractPlugin
    class_name = []
    plugin_instance = []
    # The plugin_module is the dynamic module loaded by imp.
    for key, obj in plugin_module.__dict__.items():
        try:
            if (issubclass(obj, AbstractPlugin) and
                    obj.__module__ in pathname):
                class_name.append(key)
                plugin_instance.append(obj)
        except TypeError:  # The current dict value is not an object.
            pass

Since a plugin module file is light (i.e. each plugin contains a small amount
of function, variables, etc.), iterating over its attributes is quick. Then,
OWTF retrieves the `SkipfishUnauthPlugin` class because it is the only one that
inherits from `AbstractPlugin`. The additional test line 8 ensure that the
current class found is not the `ActivePlugin` imported line 1 in the previous
snippet.

If you wonder why I have two lists to save the only class from the plugin file,
it is just a precaution that I took. That way an user may declare two classes
inside one plugin file and OWTF will only retrieve the latest one.

Finally, using both the module reference and the class name of the plugin, OWTF
can easily run it like below:

    :::python
    def RunPlugin(self, plugin_dir, plugin, save_output=True):
        plugin_path = self.GetPluginFullPath(plugin_dir, plugin)
        (path, name) = os.path.split(plugin_path)
        plugin_output = None
        plugin_module = self.GetModule('', name, path + '/')
        # Create an instance of the plugin class
        plugin_instance = plugin_module.__dict__[
            json.loads(plugin['attr'])['classname']](
                self.Core, plugin)
        # Run the plugin
        return plugin_instance.run()

## Retro compatibility

At that point, the system was working well. I have driven a couple of tests and
I did not see any weird behaviors. Then, I thought it could be nice to have
both systems working together.

I ensured the retro-compatibility thanks to the `class_name` key of a plugin.
When loading a plugin, if no class has been found, its value is set to `None`.
Then, when trying to instantiate the class, if a `KeyError` is raised, I fall
back on the older system.

Here is the modification:

    :::python
    # If no class has been found, try the old fashioned way.
    if not plugin_instance:
        attr = {'classname': None}
        try:
            attr.update(plugin_module.ATTR)
        except AttributeError:
            pass
        try:
            description = plugin_module.DESCRIPTION
        except AttributeError:
            description = ''
    else:
        # Keep in mind that a plugin SHOULD NOT have more than one
        # class. The default behaviour is to save the last found.
        class_name = class_name[-1]
        plugin_instance = plugin_instance[-1]
        attr = {'classname': class_name}
        try:  # Did the plugin define an attr dict?
            attr.update(plugin_instance.ATTR)
        except AttributeError:
            pass
        try:
            description = plugin_instance.__doc__
        except AttributeError:
            description = ''
    attr = json.dumps(attr)
    # Save the plugin into the database
    # [. . .]

And when trying to run the plugin:

    :::python
    # First try to find the class of the plugin (newest).
    try:
        # Create an instance of the plugin class
        plugin_instance = plugin_module.__dict__[
            json.loads(plugin['attr'])['classname']](
                self.Core, plugin)
        # Run the plugin
        plugin_output = plugin_instance.run()
    except KeyError:
        # Second try to find the old fashioned way.
        try:
            self.Core.log(
                'WARNING: Running the plugin ' + plugin_path +
                ' the old fashioned way.')
            plugin_output = plugin_module.run(self.Core, plugin)
        except AttributeError:
            pass

:)

Now let's start talking about the real GSoC project: the automated ranking
system.

# PTP, my solution

I have to implement an automated ranking system for OWTF and so far, the work I
have presented does nothing about that but
__[PTP](https://github.com/owtf/ptp)__ fixes that problem!

PTP is a standalone library that aims to parse the outputs of all pentester's
tools. *By all I mean it is the final goal of the project.*

That way, OWTF will use PTP in order to retrieve the ranking values of each of
its plugins.

## The origins of the idea

My GSoC proposal did not say anything about a standalone library. That is
because the idea came up after discussing with Dan Cornell who works on the
[ThreadFix](https://github.com/denimgroup/threadfix/) project which already
provides such feature.

I asked him a couple of questions about how complete ThreadFix's parsing is,
what were the difficulties it encountered, etc.

He told me that about the main problem that is keeping up to date on each tool.  
The fact is that the vendors might change the output format of the reports from
a version to another. Therefore he thought it could be a stronger approach to
develop a standalone library.

With a standalone library, it will limit the collateral effects. Plus, it will
allow a dedicated unit testing framework. In brief, a stronger project.

## Internal structure

Now, the PTP library is organized according to the following scheme:

![PTP UML Diag v1](/static/images/gsoc2k14/ptp_uml_v1.png)

First of all, PTP defines the
[`Info`](https://github.com/owtf/ptp/blob/master/libptp/info.py#L9) class that aims to represent any report.
In order to achieve such modularity, it inherits from python's `dict` object.

Here is how the `Info` class is defined:

    :::python
    class Info(dict):
        """Representation of a result from a report provided by a pentesting tool.

        + name: the name of the vulnerability
        + ranking: the ranking of the vulnerability.
        + description: the description of the vulnerability.
        + kwargs: any key/value attributes the vuln might contain

        """
        __getattr__= dict.__getitem__
        __setattr__= dict.__setitem__
        __delattr__= dict.__delitem__

        def __init__(self, name=None, ranking=None, description=None, **kwargs):
            """Self-explanatory."""
            self.name = name
            self.ranking = ranking
            self.description = description
            for key, value in kwargs.items():
                self[key] = value

Each issue from the report must have at least a name, a ranking value and a
description. Any other information will be automatically added to the class
(like a [CVSS](http://www.first.org/cvss) link to the issue for instance).

Then the
[`AbstractReport`](https://github.com/owtf/ptp/blob/master/libptp/report.py#L14)
class aggregates a list of `Info` classes. It also defines the basic behavior
of a report such as
[`is_mine`](https://github.com/owtf/ptp/blob/master/libptp/report.py#L49),
[`get_highest_ranking`](https://github.com/owtf/ptp/blob/master/libptp/report.py#L86)
and [`parse`](https://github.com/owtf/ptp/blob/master/libptp/report.py#L110)
methods.

A quick word on the second one. This method has been implemented mostly for
OWTF which needs to know the highest ranking value of a report in order to rank
the plugins' outputs. Thanks to the `RANKING_SCALE` defined by the report, it
has an homogeneous scale for rating the issues.

Here is an example of scale that homogenizes the scale from
[Skipfish](https://github.com/spinkham/skipfish) with the one from PTP:

    :::python
    HIGH = 4
    MEDIUM = 3
    LOW = 2
    WARNINGS = 1
    INFO = 0

    # Convert the Skipfish's ranking scale to an unified one.
    RANKING_SCALE = {
        HIGH: constants.HIGH,
        MEDIUM: constants.MEDIUM,
        LOW: constants.LOW,
        WARNINGS: constants.INFO,
        INFO: constants.INFO}

Then for each tool PTP supports, a new class is defined, inheriting from that
abstract class. It purpose is to specifically redefine the `is_mine` and
`parse` methods to work with the specificities of the tool.

For instance, Arachni has the following `is_mine` function:

    :::python
    @classmethod
    def is_mine(cls, pathname, filename='*.xml'):
        """Check if it is a Arachni report and if I can handle it.

        Return True if it is mine, False otherwise.

        """
        fullpath = cls._recursive_find(pathname, filename)
        if not fullpath:
            return False
        fullpath = fullpath[0]  # Only keep the first file.
        if not fullpath.endswith('.xml'):
            return False
        try:
            root = etree.parse(fullpath).getroot()
        except LxmlError:  # Not a valid XML file.
            return False
        return cls._is_arachni(root)

    @classmethod
    def _is_arachni(cls, xml_report):
        """Check if the xml_report comes from Arachni.

        Returns True if it is from Arachni, False otherwise.

        """
        if not cls.__tool__ in xml_report.tag:
            return False
        return True

For tools that do not rank their discoveries, PTP uses a dictionary that acts
as a database.  
It is the case with [Wapiti](http://wapiti.sourceforge.net/) since it does not
rank its discoveries:

    :::python
    # TODO: Complete the signatures database.
    SIGNATURES = {
        # High ranked vulnerabilities
        'SQL Injection': HIGH,
        'Blind SQL Injection': HIGH,
        'Command execution': HIGH,

        # Medium ranked vulnerabilities
        'Htaccess Bypass': MEDIUM,
        'Cross Site Scripting': MEDIUM,
        'CRLF Injection': MEDIUM,

        # Low ranked vulnerabilities
        'File Handling': LOW, # a.k.a Path or Directory listing
        'Resource consumption': LOW,

        # Informational ranked vulnerabilities
        'Backup file': INFO,
        'Potentially dangerous file': INFO,
        'Internal Server Error': INFO,
    }

Then PTP checks if the issue from the report exists in the dictionary and it
retrieves the corresponding risk:

    :::python
    class WapitiReport(AbstractReport):
        """Retrieve the information of an wapiti report."""
        # [. . .]
        def parse_xml_report(self):
            """Retrieve the results from the report."""
            vulns = self.root.find('.//vulnerabilities')
            for vuln in vulns.findall('.//vulnerability'):
                vuln_signature = vuln.get('name')
                vuln_description = vuln.find('.//description')
                if vuln_signature in SIGNATURES:
                    info = Info(
                        name=vuln_signature,
                        ranking=SIGNATURES[vuln_signature],
                        description=vuln_description.text.strip(' \n'),
                        )
                    self.vulns.append(info)

Last, the [`PTP`](https://github.com/owtf/ptp/blob/master/ptp.py#L15) class
acts as a python decorator. Its only job is to offer an unique way to use the
library.  
When calling its `parse` method, you are actually calling the `parse` method of
a report of a specific tool (hence my term of decorator).

The `PTP` class has the ability to auto-detect the tool from which the report
has been generated. In order to find the correct tool, it calls the `is_mine`
function of each tool's reports it supports.

    :::python
    class PTP(object):
        """PTP class definition.

        Usage:
            ptp = PTP()
            ptp.parse(path_to_report)

        """

        # Reports for supported tools.
        supported = {
            'arachni': ArachniReport,
            'skipfish': SkipfishReport,
            'w3af': W3AFReport,
            'wapiti': WapitiReport
            }

        def __init__(self, tool_name=None):
            self.tool_name = tool_name
            self.report = None

        def parse(self, pathname=None):
            if self.tool_name is None:
                for tool in self.supported.values():
                    if tool.is_mine(pathname):
                        self.report = tool()
                        break
            else:
                try:
                    self.report = self.supported[self.tool_name]()
                except KeyError:
                    pass
            if self.report is None:
                raise NotSupportedToolError('This tool is not supported by PTP.')
            return self.report.parse(pathname)

As you can see, the user can specify the tool via a string when instancing the
class.

An example on how to use the library:

    :::pycon
    >>> from ptp import PTP
    >>> ptp = PTP()  # Init PTP in auto-detect mode.
    >>> ptp.parse(path_to_report='path/to/the/report/directory')
    # [. . .]
    >>> print(ptp.get_highest_ranking())
    0  # 0 means that the highest know severity is HIGH

## What I would like to change

Later I would like to update PTP's internal structure since each tool will have
to support different output formats and different versions.  
That is why I wish to change its structure to something similar to the
following:

![PTP UML Diag v1](/static/images/gsoc2k14/ptp_uml_v2.png)

With this new structure, a report will not be dependent anymore on the output
format, nor the version of the tool. It will instantiate different parsing
methods that will take care of these problems.

## Current state of development

At this moment, PTP is more a proof of concept than a completely
stable/finished project.

To be precise, PTP only supports four different tools (Arachni, Skipfish,
[W3AF](http://w3af.org/) and Wapiti) and the parsing methods were only tested
for one version of each tool.

Plus, it only retrieves basic information of each issue. I decided to develop
the library that way because I needed to focus the early development in order
to have quick results for OWTF.

The master branch of the project is considered as stable and any further
modification will be done on the develop one. Since PTP is reused by OWTF, I
will have better stability for my project that way.

# Integration of PTP to OWTF

In order to have OWTF using PTP, I had to integrate the library. I managed to
succeed thanks to the following changes.

First, I needed to update the `poutput_manager` of OWTF in order to specify the
`owtf_rank` value:

    :::diff
    def SavePluginOutput(self,
                        plugin,
                        output,
                        duration,
    -                   target=None):
    +                   target=None,
    +                   owtf_rank=None):
       """Save into the database the command output of the plugin `plugin."""
       session = self.Core.DB.Target.GetOutputDBSession(target)
       session = session()
       session.merge(models.PluginOutput(
           key = plugin["key"],
           plugin_code = plugin["code"],
           plugin_group = plugin["group"],
           plugin_type = plugin["type"],
           output = json.dumps(output),
           start_time = plugin["start"],
           end_time = plugin["end"],
           execution_time = duration,
           status = plugin["status"],
    -      output_path = plugin["output_path"])
    +      output_path = plugin["output_path"],
    +      owtf_rank=owtf_rank)
    +      )
        session.commit()
        session.close()

Then I had to change a few things in the `plugin_handler`'s `ProcessPlugin`
function:

    :::diff
    def ProcessPlugin(self, plugin_dir, plugin, status={}):
       # [. . .]
       try:
           output = self.RunPlugin(plugin_dir, plugin)
           plugin['status'] = 'Successful'
           plugin['end'] = self.Core.Timer.GetEndDateTimeAsStr('Plugin')
    +      owtf_rank = None
    +      try:
    +          parser = PTP()
    +          parser.parse(pathname=plugin['output_path'])
    +          owtf_rank = 3 - parser.get_highest_ranking()
    +      except PTPError:  # Not supported tool or report not found.
    +          pass
           status['SomeSuccessful'] = True
           self.Core.DB.POutput.SavePluginOutput(
               plugin,
               output,
    -          self.Core.Timer.GetElapsedTimeAsStr('Plugin'))
    +          self.Core.Timer.GetElapsedTimeAsStr('Plugin'),
    +          owtf_rank=owtf_rank)
           return output
       except KeyboardInterrupt:
           # [. . .]

And that is all! I thought the integration would have taken more time but it
was in fact quite easy.

# User-friendly display of the rankings

So far so good. In a couple of weeks, I have managed to create a standalone
library that is my solution for the automated ranking system. It has been
successfully integrated but I still had to update the template of the plugin
report.

## The older version of the template

Before describing my modification, I show you the previous template:

![OWTF's older plugin template
version](/static/images/gsoc2k14/template_old_version.png)

As you can see, nothing is there to help the user to focus on the seemingly
weakest spots of the tested application.

## The modification

In order to help the user, OWTF needed a template where each automated ranking
value would appear. The system would be hierarchical following these rules:

1. A ranking is set for a plugin's output.
2. The maximum ranking values from the plugins' outputs is assigned to a test
   case (note, a test case is a group of a plugin's outputs since a plugin can
   be passive, active, external, semi-passive, etc.).
3. The maximum ranking values from the test cases is assigned to the report.
4. If the ranking value is from OWTF then it has to be highlighted as such.

These simple rules implied a much more complex problem. The ranking value of
the report and a test case might be changed if only one plugin's output is
re-ranked.

That means I had to implement a template where the ranking values would be
automatically updated when first the page was loaded and second when a ranking
was overridden by the user.

First I decided to create __a local copy__ of the plugins' outputs. I made that
decision __in order to optimize__ the rendering system.  
By having a local copy, __it will not be required to call the OWTF's api in
order to retrieve the outputs each time the user changes a ranking.__

    :::js
    /* Global copy of the plugins' outputs in order to avoid calling the API each
    * time the user changes a rank.
    */
    var copyPOutputs = {};

Most of that time was spent debugging why such function was not called, why
such switch test was not working, etc.  
For example, in order to update the labels when loading the page, I wrote a
function that is called when the document is ready.

With JQuery, the syntax is the following:

    :::js
    /* This function is called as soon as the document is ready. */
    $(function() {
        /* Do something. */
    });

When I wrote my code, __some functions were not called__ and I still don't
understand why:

    :::js
    $(function() {
        var req = $.getJSON(pluginSpace.poutput_api_url, function(data) {
            /* Global copy of the POutputs. */
            for (var i = 0; i < data.length; i++) {
                if (!copyPOutputs[data[i].plugin_code])
                    copyPOutputs[data[i].plugin_code] = {};
                copyPOutputs[data[i].plugin_code][data[i].plugin_type] = data[i];
            }
        });
        /* Update every colors. */
        updateTargetInfo();  /* Never called. */
        updateAllTestCasesInfo();  /* Never called. */
    });

But __that version worked like a charm__:

    :::diff
    $(function() {
        var req = $.getJSON(pluginSpace.poutput_api_url, function(data) {
            /* Global copy of the POutputs. */
            for (var i = 0; i < data.length; i++) {
                if (!copyPOutputs[data[i].plugin_code])
                    copyPOutputs[data[i].plugin_code] = {};
                copyPOutputs[data[i].plugin_code][data[i].plugin_type] = data[i];
            }
        });
        /* Update every colors. */
    +   req.complete(function() {
            updateTargetInfo();
            updateAllTestCasesInfo();
    +   });
    });

In the second version (the one working), I used the
[`.complete()`](https://api.jquery.com/jquery.getjson/) function.  Using that,
I force the calls of both `updateTargetInfo()` and `updateAllTestCasesInfo` as
soon as the request is completed.

*Note: the `complete()` function is deprecated since JQuery 1.8. `always()`
should be used instead.*

My question is why these functions were not called in the first version?
*Mystery*.

## The new version

After all these modification, OWTF has now a nicer template:

![OWTF's newer plugin template
version](/static/images/gsoc2k14/template_new_version.png)

Now the user can quickly see what spots he should check first. Then he might
override the ranking value as he wish, therefore creating the final ranking
values of the test.

The automated ranking values are highlighted using the
[`opacity`](http://www.w3schools.com/css/css_image_transparency.asp) css
attribute as shown in the Skipfish test case.

All in all, it took me around 8 + 4 straight hours of work to achieve the last
render (plus a couple of modification when I had the feedbacks). This is a lot
for such small graphical changes but JQuery is really tricky (especially when
you never ever wrote any line of JS).

# Conclusion

Wow, finally! That took longer to present than I expected.

The past four weeks were kind of full. I worked a lot on my project and __I am
really proud to see that most of the painful work has been done__.

I looked at my proposal and I saw that __I managed to almost finished the
*Stage 1* and *Stage 3*__:

+ PTP retrieves the ranking values from different tools
+ The plugin report template has been updated in order to graphically display
  the ranking values retrieved by PTP.
+ __It works!__

Of course, the *Stage 2* will take a lot of time. It concerns the supported
tools for PTP. For now, PTP only supports four tools where OWTF uses way more
but since I have a working PoC, I think it will be easy to support the other
tools.

Plus, my proposal also presents aside projects that I would like to do but only
if I have time by the end of my main project.

With my exams coming soon, I will focus on my studies for the next couple of
weeks but I hope to have something interesting for the third monthly post :)