title: GSoC 2014 - 3rd month author: depierre published: 2014-07-02 categories: Gsoc 2014, Python keywords: gsoc, 2014, project, owasp, owtf, security, pentest, python The third month of GSoC 2014 is now over. Sadly I had a lot of exams these last couple of weeks (school projects, oral presentations, theoretical exams, etc.) which means that this post will be brief. Nevertheless, I managed to enhance [PTP](https://github.com/owtf/ptp)'s architecture and I think this could be interesting. I also completed the documentation of the project using [Sphinx](http://sphinx-doc.org/) not that anyone cares. # Thought on PTP's architecture ## The limits of the previous architecture As you may have read in my previous post about my GSoC project, I found that the previous PTP's architecture would be hard to maintain as the project would grow. Below is the previous architecture: ![PTP UML Diag v1](/static/images/gsoc2k14/ptp_uml_v1.png) Since the report and the parser of one tool are defined in a unique class, the more PTP would have to support different versions and report formats of a same tool, the harder it would become to have a clean and stable code. That is why I thought it would be better for PTP to follow a new architecture in which the report and the parsing functions would be split into two distinct classes like in the following diagram: ![PTP UML Diag v2](/static/images/gsoc2k14/ptp_uml_v2.png) When implementing such architecture, I thought it would be even better to factorize the code dealing with the opening/closing of the report files into another layer. Therefore, each parser dealing with a XML report would inherit from a specialized XML abstract parser class. The same for JSON, HTML. etc. ## PTP's new architecture First, PTP's supports was not enhanced at all. It still supports only four different tools (and only a specific version of these tools). But I think that with the new architecture I implemented, it will be easier for me to add new supports, which is the main mission for the next month ;) Anyway, PTP's new architecture implements the different extra layers I presented in the previous post, plus the one in the previous section. The complete architecture is shown below: ![PTP UML Diag v3](/static/images/gsoc2k14/ptp_uml_v3.png) With the `AbstractReport` class comes the `AbstractParser` one. The functions like `is_mine` and `check_version` have been moved to the parser. They still exist in the report but now act as proxy-functions for their siblings. The `AbtractParser` defines a new function, `handle_file`, which creates a *stream* on the report data. I use the word *stream* because it might be anything, from some XML handles to list containing each line of the report. Of course, since `AbstractParser` is an abstract layer and `handle_file` is a specialized function, it has to be overridden. Speaking about XML, I implemented a `XMLParser` class that overrides the `handle_file` abstract function: :::python class XMLParser(AbstractParser): """Specialized parser for XML formatted report.""" #: str -- XMLParser only supports XML files. __format__ = 'xml' def __init__(self, pathname): """Initialize XMLParser. :param str pathname: path to the report file. """ AbstractParser.__init__(self, pathname) @classmethod def handle_file(cls, pathname): """Specialized file handler for XML files. :param str pathname: path to the report file. :raises ValueError: if the report file has not the right extension. :raises LxmlError: if Lxml cannot parse the XML file. """ if not pathname.endswith(cls.__format__): raise ValueError( "This parser only supports '%s' files" % cls.__format__) return etree.parse(pathname).getroot() As you can see, in this context, a *stream* is a [*lxml*](http://lxml.de/) tree object. I have an exception for *Skipfish* because its parsing is unique. Instead of having one report file, it generates a tree structure of directories. The simplest way I found to retrieve the ranking values is to only read the *issue_samples* file. But this file is a *js* one and I had to write an ugly hack to read it. That is why there is no specialized parser class between the abstract and the *Skipfish* ones. # Simplification ## Useless `Info` class Apart from the modification of the architecture, I tried to have a honest look on the current implementation of PTP. I was trying to answer question like *Is this pertinent?*, *Is it mandatory to have this?*, *Could it be simpler?*, etc. I read a couple of articles about python, what to do, what to avoid, and I stumbled on [a conference about the classes](http://youtu.be/o9pEzgHorH0) and how often we could avoid them. Then I took a look at my [`Info` class](https://github.com/owtf/ptp/blob/33f3f42afb3d051f6f6c4828d167fa49b1fb8fff/libptp/info.py) and realized that it was exactly a case shown in the video. Jack Diederich explains that instead of writing a class that inherits from a standard python type (`dict` in my case) because maybe it might offer something more later, I should use the standard type instead and implement that class when I will need it. Therefore I removed the `Info` class, which was in fact just a dictionary, and replaced each occurrence by a simple `dict`. ## Use iterators whenever possible Also, I realized that I was using copy-based functions too often. For instance, in python 2.x, I used the `dict.values` when looping over the dictionary values. But before python 3.x, this function creates a copy of the values before iterating over them, which can become memory inefficient when dealing with big dictionaries. Therefore, I replaced each function of this kind with its iterator version (e.g. `values` to `itervalues`). But since python 3.x, this has been modified and functions like `itervalues` have been moved to `values` (i.e. the default `values` function's behavior is to generate iterators and not to copy the values anymore). So, in order to keep the compatibility between 2.x and 3.x and force the use of iterators, I modified PTP's code to wisely decide which one to use. This wisdom is given by the following snippet for instance: :::python @classmethod def check_version(cls, metadata, key='version'): """Checks the version from the metadata against the supported ones. :param dct metadata: The metadata in which to find the version. :param str key: The :attr:`metadata` key containing the version value. :return: `True` if it support that version, `False` otherwise. :rtype: :class:`bool` """ try: parsers = cls.__parsers__.itervalues() except AttributeError: # Python3 then. parsers = cls.__parsers__.values() if metadata[key] in parsers: return True return False # Documentation using sphinx The last point I worked on during this month is creating the documentation for PTP. A couple of weeks earlier, I discussed the fact that [OWTF](https://www.owasp.org/index.php/OWASP_OWTF) should use a tool that automatically generates the documentation from the docstrings. We were hesitating between [doxygen](http://www.stack.nl/~dimitri/doxygen/) and [sphinx](http://sphinx-doc.org/). Since I already used doxygen in other projects, I thought I could use sphinx for PTP as a beta test, before deciding which one to use in OWTF. Therefore I spent a day or two writing the full technical documentation for PTP, as you may have guessed based on the snippets I presented in this post. I was glad to see that the configuration of sphinx for a project is pretty easy and the syntax is really simple but so is doxygen. For now, I think both are equals and this beta test did not give me enough elements to decide on which one to use for OWTF. Another point I would like to briefly discussed is that I really want to have something clean for PTP. It is for now the cleanest project I have created in my opinion and I want to keep it as clean as possible. That is why, after a friend of mine linked me the repository of [SyncThing](https://github.com/calmh/syncthing), which is one of the cleanest repo I have ever see, I decided to follow the same release system as it uses. [The release I have pushed](https://github.com/owtf/ptp/releases/tag/v0.1.0) and every next ones will therefore follow the [Semantic Versioning Guidelines](http://semver.org/) # Conclusion I cannot really conclude on anything after this month. With my exams I did not really have a lot of free times to fully work on my GSoC project. Nevertheless I managed to updated PTP's architecture like I aimed to do in my previous post. I feel confident about the robustness of the new one and I think that it will ease any further work. Also I simplified the code base and ensured the inter-compatibility between python 2.x and 3.x versions and I was able to configure sphinx and write the technical documentation of the project. I hope I will have more to say for the next monthly GSoC post :)