
Processing Modules
******************

Cuckoo's processing modules are Python scripts that let you define
custom ways to analyze the raw results generated by the sandbox and
append some information to a global container that will be later used
by the signatures and the reporting modules.

You can create as many modules as you want, as long as they follow a
predefined structure that we will present in this chapter.


Global Container
================

After an analysis is completed, Cuckoo will invoke all the processing
modules available in the *modules/processing/* directory. Any
additional module you decide to create must be placed inside that
directory.

Every module should also have a dedicated section in the file
*conf/processing.conf*: for example if you create a module
*module/processing/foobar.py* you will have to append the following
section to *conf/processing.conf*:

   [foobar]
   enabled = yes

Every module will then be initialized and executed and the data
returned will be appended in a data structure that we'll call **global
container**.

This container is simply just a big Python dictionary that includes
the abstracted results produced by all the modules classified by their
identification key.

Cuckoo already provides a default set of modules which will generate a
*standard* global container. It's important for the existing reporting
modules (HTML report etc.) that these default modules are not
modified, otherwise the resulting global container structure would
change and the reporting modules wouldn't be able to recognize it and
extract the information used to build the final reports.

The currently available default processing modules are:
   * **AnalysisInfo** *(modules/processing/analysisinfo.py)* -
     generates some basic information on the current analysis, such as
     timestamps, version of Cuckoo and so on.

   * **BehaviorAnalysis** *(modules/processing/behavior.py)* -
     parses the raw behavioral logs and perform some initial
     transformations and interpretations, including the complete
     processes tracing, a behavioral summary and a process tree.

   * **Debug** *(modules/processing/debug.py)* - includes errors and
     the *analysis.log* generated by the analyzer.

   * **Dropped** *(modules/processing/dropped.py)* - includes
     information on the files dropped by the malware and dumped by
     Cuckoo.

   * **Memory** *(modules/processing/memory.py)* - executes
     Volatility on a full memory dump.

   * **NetworkAnalysis** *(modules/processing/network.py)* - parses
     the PCAP file and extracts some network information, such as DNS
     traffic, domains, IPs, HTTP requests, IRC and SMTP traffic.

   * **ProcMemory** *(modules/processing/procmemory.py)* - performs
     analysis of process memory dump. **Note**: the module is able to
     process user defined Yara rules from
     data/yara/memory/index_memory.yar. Just edit this file to add
     your Yara rules.

   * **StaticAnalysis** *(modules/processing/static.py)* - performs
     some static analysis of PE32 files.

   * **Strings** *(modules/processing/strings.py)* - extracts
     strings from the analyzed binary.

   * **TargetInfo** *(modules/processing/targetinfo.py)* - includes
     information on the analyzed file, such as hashes.

   * **VirusTotal** *(modules/processing/virustotal.py)* - searches
     on VirusTotal.com for antivirus signatures of the analyzed file.
     **Note**: the file is not uploaded on VirusTotal.com, if the file
     was not previously uploaded on the website no results will be
     retrieved.


Getting started
===============

In order to make them available to Cuckoo, all processing modules must
be placed inside the folder at *modules/processing/*.

A basic processing module could look like:

      from lib.cuckoo.common.abstracts import Processing

      class MyModule(Processing):

          def run(self):
              self.key = "key"
              data = do_something()
              return data

Every processing module should contain:
   * A class inheriting "Processing".

   * A "run()" function.

   * A "self.key" attribute defining the name to be used as a sub
     container for the returned data.

   * A set of data (list, dictionary, string, etc.) that will be
     appended to the global container.

You can also specify an "order" value, which allows you to run the
available processing modules in an ordered sequence. By default all
modules are set with an "order" value of "1" and are executed in
alphabetical order.

If you want to change this value your module would look like:

      from lib.cuckoo.common.abstracts import Processing

      class MyModule(Processing):
          order = 2

          def run(self):
              self.key = "key"
              data = do_something()
              return data

You can also manually disable a processing module by setting the
"enabled" attribute to "False":

      from lib.cuckoo.common.abstracts import Processing

      class MyModule(Processing):
          enabled = False

          def run(self):
              self.key = "key"
              data = do_something()
              return data

The processing modules are provided with some attributes that can be
used to access the raw results for the given analysis:

   * "self.analysis_path": path to the folder containing the results
     (e.g. *storage/analysis/1*)

   * "self.log_path": path to the *analysis.log* file.

   * "self.conf_path": path to the *analysis.conf* file.

   * "self.file_path": path to the analyzed file.

   * "self.dropped_path": path to the folder containing the dropped
     files.

   * "self.logs_path": path to the folder containing the raw
     behavioral logs.

   * "self.shots_path": path to the folder containing the
     screenshots.

   * "self.pcap_path": path to the network pcap dump.

   * "self.memory_path": path to the full memory dump, if created.

   * "self.pmemory_path": path to the process memory dumps, if
     created.

With these attributes you should be able to easily access all the raw
results stored by Cuckoo and perform your analytic operations on them.

As a last note, a good practice is to use the "CuckooProcessingError"
exception whenever the module encounters an issue you want to report
to Cuckoo. This can be done by importing the class like this:

      from lib.cuckoo.common.exceptions import CuckooProcessingError
      from lib.cuckoo.common.abstracts import Processing

      class MyModule(Processing):

          def run(self):
              self.key = "key"

              try:
                  data = do_something()
              except SomethingFailed:
                  raise CuckooProcessingError("Failed")

              return data
