
Signatures
**********

With Cuckoo you're able to create some customized signatures that you
can run against the analysis results in order to identify some
predefined pattern that might represent a particular malicious
behavior or an indicator you're interested in.

These signatures are very useful to give a context to the analyses:
both because they simplify the interpretation of the results as well
as for automatically identifying malware samples of interest.

Some examples of what you can use Cuckoo's signatures for:
   * Identify a particular malware family you're interested in by
     isolating some unique behaviors (like file names or mutexes).

   * Spot interesting modifications the malware performs on the
     system, such as installation of device drivers.

   * Identify particular malware categories, such as Banking Trojans
     or Ransomware by isolating typical actions commonly performed by
     those.

   * Classify samples into the categories malware/unknown (it is not
     possible to identify clean samples)

You can find signatures created by us and by other Cuckoo users on our
Community repository.


Getting started
===============

Creation of signatures is a very simple process and requires just a
decent understanding of Python programming.

First things first, all signatures must be located inside
*modules/signatures/*.

The following is a basic example signature:

      from lib.cuckoo.common.abstracts import Signature

      class CreatesExe(Signature):
          name = "creates_exe"
          description = "Creates a Windows executable on the filesystem"
          severity = 2
          categories = ["generic"]
          authors = ["Cuckoo Developers"]
          minimum = "2.0"

          def on_complete(self):
              return self.check_file(pattern=".*\\.exe$",
                                     regex=True)

As you can see the structure is really simple and consistent with the
other modules. We're going to get into details later, but since
version 1.2 Cuckoo provides some helper functions that make the
process of creating signatures much easier.

In this example we just walk through all the accessed files in the
summary and check if there is anything ending with "*.exe*": in that
case it will return "True", meaning that the signature matched,
otherwise return "False".

the function "on_complete" is called at the end of the cuckoo
signature process. Other function will be called before on specific
events and help you to write more sophisticated and faster signatures.

In case the signature gets matched, a new entry in the "signatures"
section will be added to the global container as follows:

   "signatures": [
       {
           "severity": 2,
           "description": "Creates a Windows executable on the filesystem",
           "alert": false,
           "references": [],
           "data": [
               {
                   "file_name": "C:\\d.exe"
               }
           ],
           "name": "creates_exe"
       }
   ]


Creating your new signature
===========================

In order to make you better understand the process of creating a
signature, we are going to create a very simple one together and walk
through the steps and the available options. For this purpose, we're
simply going to create a signature that checks whether the malware
analyzed opened a mutex named "i_am_a_malware".

The first thing to do is import the dependencies, create a skeleton
and define some initial attributes. These are the ones you can
currently set:

   * "name": an identifier for the signature.

   * "description": a brief description of what the signature
     represents.

   * "severity": a number identifying the severity of the events
     matched (generally between 1 and 3).

   * "categories": a list of categories that describe the type of
     event being matched (for example "*banker*", "*injection*" or
     "*anti-vm*").

   * "families": a list of malware family names, in case the
     signature specifically matches a known one.

   * "authors": a list of people who authored the signature.

   * "references": a list of references (URLs) to give context to
     the signature.

   * "enable": if set to False the signature will be skipped.

   * "alert": if set to True can be used to specify that the
     signature should be reported (perhaps by a dedicated reporting
     module).

   * "minimum": the minimum required version of Cuckoo to
     successfully run this signature.

   * "maximum": the maximum required version of Cuckoo to
     successfully run this signature.

In our example, we would create the following skeleton:

      from lib.cuckoo.common.abstracts import Signature

      class BadBadMalware(Signature): # We initialize the class inheriting Signature.
          name = "badbadmalware" # We define the name of the signature
          description = "Creates a mutex known to be associated with Win32.BadBadMalware" # We provide a description
          severity = 3 # We set the severity to maximum
          categories = ["trojan"] # We add a category
          families = ["badbadmalware"] # We add the name of our fictional malware family
          authors = ["Me"] # We specify the author
          minimum = "2.0" # We specify that in order to run the signature, the user will simply need Cuckoo 2.0

          def on_complete(self):
              return

This is a perfectly valid signature. It doesn't really do anything
yet, so now we need to define the conditions for the signature to be
matched.

As we said, we want to match a particular mutex name, so we proceed as
follows:

      from lib.cuckoo.common.abstracts import Signature

      class BadBadMalware(Signature):
          name = "badbadmalware"
          description = "Creates a mutex known to be associated with Win32.BadBadMalware"
          severity = 3
          categories = ["trojan"]
          families = ["badbadmalware"]
          authors = ["Me"]
          minimum = "2.0"

          def on_complete(self):
              return self.check_mutex("i_am_a_malware")

Simple as that, now our signature will return "True" whether the
analyzed malware was observed opening the specified mutex.

If you want to be more explicit and directly access the global
container, you could translate the previous signature in the following
way:

      from lib.cuckoo.common.abstracts import Signature

      class BadBadMalware(Signature):
          name = "badbadmalware"
          description = "Creates a mutex known to be associated with Win32.BadBadMalware"
          severity = 3
          categories = ["trojan"]
          families = ["badbadmalware"]
          authors = ["Me"]
          minimum = "2.0"

          def on_complete(self):
              for process in self.get_processes_by_pid():
                  if "summary" in process and "mutexes" in process["summary"]:
                      for mutex in process["summary"]["mutexes"]:
                          if mutex == "i_am_a_malware":
                              return True

              return False


Evented Signatures
==================

Since version 1.0, Cuckoo provides a way to write more high
performance signatures. In the past every signature was required to
loop through the whole collection of API calls collected during the
analysis. This was necessarily causing some performance issues when
such collection would be of a large size.

Since 1.2 Cuckoo only supports the so called "evented signatures". The
old signatures based on the "run" function can be ported to using
"on_complete". The main difference is that with this new format, all
the signatures will be executed in parallel and a callback function
called "on_call()" will be invoked for each signature within one
single loop through the collection of API calls.

An example signature using this technique is the following:

      from lib.cuckoo.common.abstracts import Signature

      class SystemMetrics(Signature):
          name = "generic_metrics"
          description = "Uses GetSystemMetrics"
          severity = 2
          categories = ["generic"]
          authors = ["Cuckoo Developers"]
          minimum = "2.0"

          # Evented signatures can specify filters that reduce the amount of
          # API calls that are streamed in. One can filter Process name, API
          # name/identifier and category. These should be sets for faster lookup.
          filter_processnames = set()
          filter_apinames = set(["GetSystemMetrics"])
          filter_categories = set()

          # This is a signature template. It should be used as a skeleton for
          # creating custom signatures, therefore is disabled by default.
          # The on_call function is used in "evented" signatures.
          # These use a more efficient way of processing logged API calls.
          enabled = False

          def on_complete(self):
              # In the on_complete method one can implement any cleanup code and
              #  decide one last time if this signature matches or not.
              #  Return True in case it matches.
              return False

          # This method will be called for every logged API call by the loop
          # in the RunSignatures plugin. The return value determines the "state"
          # of this signature. True means the signature matched and False it did not this time.
          # Use self.deactivate() to stop streaming in API calls.
          def on_call(self, call, pid, tid):
              # This check would in reality not be needed as we already make use
              # of filter_apinames above.
              if call["api"] == "GetSystemMetrics":
                  # Signature matched, return True.
                  return True

              # continue
              return None

The inline comments are already self-explanatory.

Another event is triggered when a signature matches.

      def on_signature(self, matched_sig):
          required = ["creates_exe", "badmalware"]
          for sig in required:
              if not sig in self.list_signatures():
                  return
          return True

This kind of signature can be used to combine several signatures
identifying anomalies into one signature classifying the sample
(malware alert).

Two more events can be used to write more complex signatures. They are
called when new processes are processed or new threads. "on_process"
and "on_thread". They can be used to reset the state of flags when a
new process is entered. Allowing to write signatures that take into
account if a series of events happened in one thread/process or
globally.

   def on_process(self, pid):
       pass

   def on_thread(self, pid, tid):
       pass


Quickout
========

Additionally to the filters you can use a quickout function to
determine if the signature can be matched at all. For example if a
signature is written to identify behavior of malicious PDFs you could
test for the file type to be PDF. Returning "True" will remove the
signature from the list of potential candidates (reducing False
Positives and processing time).

You can find many more example of signatures in our community
repository.


Matches
=======

Starting from version 1.2, signatures are able to log exactly what
triggered the signature. This allows users to better understand why
this signature is present in the log, and to be able to better focus
malware analysis.

Two helpers have been included in order to specify matching data.

Signature.add_match(process, type, match)

   Adds a match to the signature. Can be called several times for the
   same signature.

   Parameters:
      * **process** (*dict*) -- process dictionary (same as the
        "on_call" argument). Should be "None" except when used in
        evented signatures.

      * **type** (*string*) -- nature of the matching data. Can be
        anything (ex: "'file'", "'registry'", etc.). If match is
        composed of api calls (when used in evented signatures),
        should be "'api'".

      * **match** -- matching data. Can be a single element or a
        list of elements. An element can be a string, a dict or an API
        call (when used in evented signatures).

   Example Usage, with a single element:

      self.add_match(None, "url", "http://malicious_url_detected.com")

   Example Usage, with a more complex signature, needing several API
   calls to be triggered:

      self.signs = []
      self.signs.append(first_api_call)
      self.signs.append(second_api_call)
      self.add_match(process, 'api', self.signs)

Signature.has_matches()

   Checks whether the current signature has any matching data
   registered. Returns "True" in case it does, otherwise returns
   "False".

   This can be used to easily add several matches for the same
   signature. If you want to do so, make sure that all the api calls
   are scanned by making sure that "on_call" never returns "True".
   Then, use "on_complete" with "has_matches" so that the signature is
   triggered if any match was previously added.

   Return type:
      boolean

   Example Usage, from the *network_tor* signature:

      def on_call(self, call, process):
          if self.check_argument_call(call,
                                      pattern="Tor Win32 Service",
                                      api="CreateServiceA",
                                      category="services"):
              self.add_match(process, "api", call)

      def on_complete(self):
          return self.has_matches()


Helpers
=======

As anticipated, from version 0.5 the "Signature" base class also
provides some helper methods that simplify the creation of signatures
and avoid the need for you having to access the global container
directly (at least most of the times).

With Cuckoo 1.2 the amount of information extracted from a sample grew
another step and the resulting data format got more complex. To avoid
having to port your signatures with every extension and to reduce
errors we strongly suggest to use the helpers to access data.

Following is a list of available methods.

Signature.check_file(pattern[, regex=False])

   Checks whether the malware opened or created a file matching the
   specified pattern. Returns "True" in case it did, otherwise returns
   "False".

   Parameters:
      * **pattern** (*string*) -- file name or file path pattern to
        be matched

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

   Return type:
      boolean

   Example Usage:

      self.check_file(pattern=".*\.exe$", regex=True)

Signature.check_key(pattern[, regex=False[, actions=["regkey_written", "regkey_opened", "regkey_read"][, pid=None]]])

   Checks whether the malware opened or created a registry key
   matching the specified pattern. Returns "True" in case it did,
   otherwise returns "False".

   Parameters:
      * **pattern** (*string*) -- registry key pattern to be matched

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

      * **actions** (*list*) -- a list of key-access-actions to
        search the key in

      * **pid** (*int*) -- process id to filter for

   Return type:
      boolean

   Example Usage:

      self.check_key(pattern=".*CurrentVersion\\Run$", regex=True)

Signature.check_mutex(pattern[, regex=False])

   Checks whether the malware opened or created a mutex matching the
   specified pattern. Returns "True" in case it did, otherwise returns
   "False".

   Parameters:
      * **pattern** (*string*) -- mutex pattern to be matched

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

   Return type:
      boolean

   Example Usage:

      self.check_mutex("mutex_name")

Signature.check_api(pattern[, process=None[, regex=False]])

   Checks whether Windows function was invoked. Returns "True" in case
   it was, otherwise returns "False".

   Parameters:
      * **pattern** (*string*) -- function name pattern to be
        matched

      * **process** (*string*) -- name of the process performing the
        call

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

   Return type:
      boolean

   Example Usage:

      self.check_api(pattern="URLDownloadToFileW", process="AcroRd32.exe")

Signature.check_argument_call(call, pattern[, name=Name[, api=None[, category=None[, regex=False]]])

   Checks whether the malware invoked a function with a specific
   argument value. Returns "True" in case it did, otherwise returns
   "False".

   Parameters:
      * **call** -- the call to check to check the argument in

      * **pattern** (*string*) -- argument value pattern to be
        matched

      * **name** (*string*) -- name of the argument to be matched

      * **api** (*string*) -- name of the Windows function
        associated with the argument value

      * **category** (*string*) -- name of the category of the
        function to be matched

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

   Return type:
      boolean

   Example Usage:

      self.check_argument_call(call, pattern=".*cuckoo.*", category="filesystem", regex=True)

Signatures.list_signatures()

   Returns a list of signature names that matched so far. It can be
   used to write meta-signatures combining several signatures on
   anomalies into a classification.

   Return type:
      list

   Example Usage:

      def on_signature(self, matched_sig):
          required = ["creates_exe", "badmalware"]
          for sig in required:
              if not sig in self.list_signatures():
                  return
          return True

Signatures.get_processes([name=None])

   An iterator returning the processes monitored. If name is given,
   they will be filtered for the name

   Parameters:
      **name** (*string*) -- Name of the process to filter for

   Return type:
      iterator

   Example Usage:

      for process in self.get_processes("foo"):
          pass

Signatures.get_processes_by_pid([pid=None])

   An iterator returning the processes monitored. If pid is given,
   they will be filtered for the process id

   Parameters:
      **pid** (*int*) -- Process ID of the process to filter for

   Return type:
      iterator

      for process in self.get_processes_by_pid(4):
          pass

Signatures.get_threads([pid=None])

   An iterator returning the threads monitored. If pid is given, they
   will be filtered for the process id.

   Parameters:
      **pid** (*int*) -- Name of the process to filter for

   Return type:
      iterator

      for thread in self.get_threads():
          pass

Signatures.get_files([pid=None[, actions=None]])

   Iterates over the files accessed by a process (or all processes).
   Access type can be a list of "file_written", "file_read",
   "file_deleted". Default is all.

   Parameters:
      * **pid** (*int*) -- Name of the process to filter for

      * **actions** (*list*) -- access types of the files to return

   Return type:
      iterator

      for afile in self.get_files():
          pass

Signatures.get_keys([pid=None[, actions=None]])

   Iterates over the registry keys accessed by a process (or all
   processes). Access type can be a list of "regkey_written",
   "regkey_opened", "regkey_read". Default is all.

   Parameters:
      * **pid** (*int*) -- Name of the process to filter for

      * **actions** (*list*) -- access types of the registry keys to
        return

   Return type:
      iterator

      for akey in self.get_keys():
          pass

Signatures.get_mutexes([pid=None])

   Returns a list of mutexes. Optionally filtered by process id

   Parameters:
      **pid** (*int*) -- Name of the process to filter for

   Return type:
      list

      for mutex in self.get_mutexes():
          pass

Signature.get_net_hosts()

   Returns a list of hosts from the network sniffing part of the
   collected data

   Return type:
      list

      for host in self.get_net_hosts():
          pass

Signature.get_net_domains()

   Returns a list of domains from the network sniffing part of the
   collected data

   Return type:
      list

      for domain in self.get_net_domains():
          pass

Signature.get_net_http()

   Returns a list of http information blocks from the network sniffing
   part of the collected data

   Return type:
      list

      for http_data in self.get_net_http():
          pass

Signature.get_net_udp()

   Returns a list of udp information blocks from the network sniffing
   part of the collected data

   Return type:
      list

      for udp_data in self.get_net_udp():
          pass

Signature.get_net_icmp()

   Returns a list of icmp information blocks from the network sniffing
   part of the collected data

   Return type:
      list

      for icmp_data in self.get_net_icmp():
          pass

Signature.get_net_irc()

   Returns a list of irc information blocks from the network sniffing
   part of the collected data

   Return type:
      list

      for irc_data in self.get_net_irc():
          pass

Signature.get_net_smtp()

   Returns a list of smtp information blocks from the network sniffing
   part of the collected data

   Return type:
      list

      for smtp_data in self.get_net_smtp():
          pass

Signature.check_ip(pattern[, regex=False])

   Checks whether the malware contacted the specified IP address.
   Returns "True" in case it did, otherwise returns "False".

   Parameters:
      * **pattern** (*string*) -- IP address to be matched

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

   Return type:
      boolean

   Example Usage:

      self.check_ip("123.123.123.123")

Signature.check_domain(pattern[, regex=False])

   Checks whether the malware contacted the specified domain. Returns
   "True" in case it did, otherwise returns "False".

   Parameters:
      * **pattern** (*string*) -- domain name to be matched

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

   Return type:
      boolean

   Example Usage:

      self.check_domain(pattern=".*cuckoosandbox.org$", regex=True)

Signature.check_url(pattern[, regex=False])

   Checks whether the malware performed an HTTP request to the
   specified URL. Returns "True" in case it did, otherwise returns
   "False".

   Parameters:
      * **pattern** (*string*) -- URL pattern to be matched

      * **regex** (*boolean*) -- enable to compile the pattern as a
        regular expression

   Return type:
      boolean

   Example Usage:

      self.check_url(pattern="^.+\/load\.php\?file=[0-9a-zA-Z]+$", regex=True)

Signature.flags.set(name[, pid=None[, tid=None[, timestamp=None]]])

   Flags can be used to collect information in the on_call section of
   a signature and react (decide to alert) in the on_complete part if
   all required flags are set. Flags are signature specific. They are
   identified by their name. PID, TID and timestamp can be later used
   to identify if a flag was set in a specific process/thread or at a
   specific time.

   Parameters:
      * **name** (*string*) -- Name of the flag to set

      * **pid** (*int*) -- process id

      * **tid** (*int*) -- thread id

      * **timestamp** (*int*) -- timestamp in the log to mark this
        flag for

      def on_call(self, call, pid, tid):
          self.flags.set("foo", 1, 2, 2345)

Signature.flags.find([name=None[, pid=None[, tid=None[, before=None[, after=None]]]])

   Returns a list of flags matching the given criteria

   Parameters:
      * **name** (*string*) -- Name of the flag look for

      * **pid** (*int*) -- process id to filter for

      * **tid** (*int*) -- thread id to filter for

      * **before** (*int*) -- flag timestamp must be <= before-
        timestamp

      * **after** (*int*) -- flag timestamp must be >= after-
        timestamp

      def on_complete(self):
          if self.flags.find("foo"):
              self.data.append({"Flag found matching name": "foo"})
              return True

Signature.mark_start()

   Mark the start of a api-call region relevant for the signature.
   This way the report can contain a link to the API call that
   triggered the signature. As soon as the signature returns "True"
   this mark will be stored in the report. Subsequent start marks will
   overwrite the old one till it is stored in the results with the
   triggering of the signature. So you can set a start mark "on
   suspicion" and overwrite it several times till the signature
   triggers.

   It is marking the api call. So the only reasonable signatures to
   use it is in on_call evented ones

      def on_call(self, call, pid, tid):
          if self.check_argument_call(call, pattern=".*cuckoo.*", category="filesystem", regex=True):
              self.mark_start()
              return True

Signature.mark_end()

   A complementary function to mark_start. It is optional and marks
   the end of a api call range that triggered the signature. It should
   be called before returning the "True" result.

   Without a prior mark_start, the end-mark will not be stored in the
   result.

      def on_call(self, call, pid, tid):
          if self.check_argument_call(call, pattern=".*foo.*", category="filesystem", regex=True):
              self.mark_start()
              return None
          if self.check_argument_call(call, pattern=".*cuckoo.*", category="filesystem", regex=True):
              self.mark_end()
              return True

Signature.deactivate()

   Deactivate a signature. Deactivated signatures will not be notified
   in "on_call" events. This can be used after a signature triggered
   (just before the return) to match this signature only once.

      def on_call(self, call, pid, tid):
          if call["api"] == "LdrGetProcedureAddress":
              self.deactivate()
              return True

Signature.activate()

   Re-activates a signature after it has been de-activated.

      def on_process(self, pid):
          self.activate()
