Saturday, April 9, 2016

Getting the Most Out of Static Analysis and Avoiding Pitfalls


Being Lulled by the Magic of the Big Red Button

I have seen it too often.  Sales guys talks to security decision maker about the need to proactively find vulnerabilities.  Then the sales guy shows the decision maker this amazing product which finds vulnerabilities in source code.  The decision maker buys the product and gives it to his security engineer.

The security engineer is given some basic (1 week or no) training and is tasked with running the tool on their source code.

This is where the value you get out the tool may vary. 

Sometimes you get lucky and scan an application which is supported by the static analysis tool (often a Web application using a well supported language/framework).  Other times you scan code that is very esoteric such as a smart card or embedded secure element application and are lucky to get a few of the more serious security findings (buffer overflow, integer overflow, command injection, sql injection, etc.).

There are many things that could have gone wrong.  This book/paper will give you the skills to troubleshoot what may have gone wrong and help you get the most out of your static analysis investment.

Overall Process

Getting the most out of your static analysis can be easier when you follow a logical process.  Here is the process I am advocating:

·      Understand What you are Scanning (Language, Web app, mobile app, trusted app, library, etc.)
·      Validate the Integrity of Your Scan
·      Add Custom Rules
·      Add Canaries
·      Understand What Static Analysis Cannot Do


Understanding What You are Scanning

You need to make sure that the static analysis tool you are using has rules to find the more serious vulnerabilities in your app.  In order to verify with your static analysis tool vendor, you need to understand what language, frameworks, libraries, backends, and environments your application runs in.

If you are running the Spring MVC web framework with a JDBC based backend then your tool will probably provide good coverage.  However, if you are using a more esoteric framework or your own custom web framework then most tools will require you to write custom rules to find the more “meaty” vulnerabilities (XSS, Path Manipulation, Buffer Overflow, Integer Overflow, SQL Injection, Command Injection, etc.)  The type of custom rules required depends on where the lack of coverage of rules applies.  For example, if you are using a custom web framework which receives input from the user, you will need to create source or entrypoint rules.  If you use a custom backend or not so popular database then you will need to create sink rules.  And if you use a 3rd party library in your application and do not include the library’s source during the scan then you will need passthrough rules.

When you understand what you are scanning you can ask your static analysis tools vendor more intelligent questions such as, “Do you have support for framework X?” or “Can your tool find vulnerabilities in usage of database X APIs?”  In a lot of cases, multiple vendors will say “Yes”.  This is not enough.  When trying to select between different static analysis tools vendors ask them to show you the rules specific to your framework, libraries, and back-ends.  Look at the quality of the rules.  Don’t just compare the number of rules.  Make sure the rules cover the most serious vulnerabilities that you are aware of in your framework, library or back-ends.

Once you are satisfied with the rule coverage of what you are scanning and you are confident in the coverage of your application components provided by your static analysis tool vendor then you can proceed to scan your application.

The first thing you should do after scanning your application is assessing the integrity of the scan.


Validating the Integrity of the Scan

Now that you think you have scanned the application, you need to verify that the scan covered all of the files in your application and occurred without any errors. 

In this sub-process, you will ask yourself the following questions:

  • Did all of your source and application
    artifacts get scanned by your
    static analysis tool (Verifying Coverage)?
  • Were there any errors during the translation?
  • Were there any errors during the analysis phase?
  • All these worked fine…Now What?

Verifying Coverage

Almost every static analysis engine will convert your source files into some type of intermediate file representation which can be analyzed more generically.  Each static analysis tool is different in the exact format but most will store these intermediate files on the file system.  These files will have similar names to the original file it was created from:

Your/source/directory/  =>  static-analysis/engine/build_id/

You can quickly compare the program files in your source directory with the generated intermediate files in static analysis engine build_id directory.  You will need to contact your static analysis vendor support team to find the specific location of these intermediate representation files. 

In Coverity you can see which files were translated by looking for “cov-translate” in your build.log or looking in the cov-build dir <idir> directory

In Checkmarx, the <Engine Install Dir>\Logs\ScanLogs\<Project name>\YYYYMMDDT######-##_#_#########_############_#-##########_####.log will list the files that have been translated.

If there are files that were not translated (you are missing translated files), you will need to identify the cause.  In some cases, the command line tools that translate the source files may be misspelled in the make files.  Other times there will be a bug introduced in the modified make file which will cause the build to fail part way through the build.  Other times, the compiler used by the application is not recognized by the static analysis tool.  You will need to make modifications to the static analysis tool configuration settings to recognize your compiler.  Once you have verified that all of the relevant files have been translated or that you cannot get more files to translate then you will need to see if there were errors during or after the translation process.

To identify if an error occurred during translation, you will need to get the static analysis tool to output debug information to a log file.  Every static tool has a way of doing this and you can look in the User’s Guide for your static analysis tool to find out how to enable this feature.

Once the feature is enabled and you have the logs, you will need to check the logs for errors and exceptions.  Most errors that I have encountered have been related to not being able to resolve dependencies, running out of resources, or exceptions thrown during translation.

In addition, seeing an output file as a result of the translation phase is not enough.  Make sure that the size is non-zero and if you are really paranoid look into the file to make sure all methods in the source class are represented in the translated file.

When you get a message that a class was not able to be resolved, the tool is telling you that the source code was not provided and thus dataflow through these objects will be blocked.  You will need to find the jar file or library containing the missing class/resource and include it as part of your classpath to get rid of this issue.

Errors and exceptions during translation to intermediate files are usually due to the parser not being able to understand certain coding constructs.  There may be settings to enable support for different versions of programming language constructs.  For example, Java 8 formally introduced lambda expressions into the language.  If the static analysis engine is configured to parse Java 1.7 structures, it may fail parsing 1.8 lambda expressions.  Once all of the files have been successfully translated, you will need to check the logs for analysis failures. 

Analysis Failures

You can identify analysis failures by looking for messages similar to “dataflow analysis failed …”, “error”, “exception” or “[some type of] analysis failed”.  If this type of error occurs, then you will need to contact the support team of your static analysis vendor.  The common type of analysis based error is when the analysis engine runs out of resources.  Common resources that cause problems are lack of memory (OutOfMemoryException) or lack of disk (missing file descriptors or space).  The only way to solve these types of resource problems is by adding more resources (disk space, memory, CPUs, etc.)  It is typical to be running your scans on systems with more than 128+GB of RAM—especially if you can run the scan in parallel.  Also, using SSD could reduce the translation times for large scans.


So now that you are translating all your files and there are no errors in the logs—you may still not be getting the “meaty” security related findings (and even if you are--you should still read on to possibly get more)?  In order to move forward you need to understand the static analysis process.

Getting an Understanding of the Internal Process of the Static Analysis Engine to Dig Deeper

There are usually several steps that all static analysis engines go through when analyzing code.  The first is translating the code into a format which can be analyzed.  Once the source code is in a format which can be analyzed then the analysis can occur.  There are different kinds of analysis that the static analysis engine can do on your source code.  Dataflow is the most common used to find “meaty” security vulnerabilities.  With dataflow analysis, a static analysis engine will trace through the source code of the application looking for paths from sources or entry points to sinks. Sources and entry points are where untrusted (attacker provided) data enters your application.  The code is then traversed to identify pathways to sinks. Sinks represent locations in code where untrusted data may cause a vulnerability.  The traversing of code for dataflow analysis is usually dependent on having all the source code in the call path.  If you don’t have the source for a particular library in a call path, then you will need to write pass-through rules to help the static analysis engine “connect the dots”.  If you do not write a pass-through rule then the static analysis engine will not know how to trace the dataflow further through the 3rd party API calls. 3rd party libraries are the jar files that you include in the “lib” folder of your application where the source of these libraries was not included in the scan.  The important thing to understand is that if your code call paths go through 3rd party libraries before reaching sinks, the static analysis engine will usually need you to write pass-through rules for the 3rd party libraries that you use or include the source of the 3rd party libraries in your code scan to get results flowing through the 3rd party libraries.  Some smarter static analysis engines (Checkmarx) utilize heuristics in the code to generate pass-through rules on the fly. 

So you added source, entrypoint, passthrough, and sink rules and/or you included the source code of the open source libraries in the scan but are still not getting security findings, you may have problems with your new custom rules.

Coverage of Rules Determines How Well You Will Find Vulnerabilities

Every static analysis tool has a set of core rules.  These rules have to be written for different types of applications, frameworks, and architecture.  The problem is that most of these rules are tied to a core set of popular languages, applications (web and mobile), and frameworks.  If you are working with less popular architectures, frameworks, trusted execution environments, embedded Secure Elements (eSE) or libraries, you will probably find your tool lacking in direct support of these esoteric architectures, languages, and frameworks.  It is because there are no rules that can generate the data flows to identify vulnerabilities in your architecture, application, framework or library.  So how can you tell if there is a rules problem in the first place?  You need a way to verify that a finding is being found if present.

Creating Static Analysis Code Canaries

Canaries were taken into coal mining shafts with the miners. If the birds started acting weird or got sick the miners knew to exit the mine because of the dangerous situation.  A canary in the static analysis sense means adding a vulnerability to your source code that should be identified by your static analysis engine to prove that it is working properly.  The placement of canaries is important because you want ensure coverage over the common data flows through your application.  The placement of your canary should take 3rd party libraries and event based code paths into account. If possible you should place canaries in ever increasing levels of depth in your application.  Also very the type of location, i.e., front end code for XSS, backend code for SQL Injection, and business logic for file path manipulation.  Ensure that you place comments around your code such as:

//Canary Begin
boolean businessLogicMethod(String untrusted_param, … ) {
                        File f = new File(untrusted_param);
//Canary End

Then add a rule to look for the keyword “Canary” in comments.

If you scan your code and the canaries do not show up as vulnerabilities, then you know you have a missing rules problem.  So now that you recognize that you have a rules problem, how you know which types of rules to write?

Adding Custom Rules

When you are determining what type of rules to write, you need to consider the type of application you are scanning.  It will also help to get in contact with the support team of your static analysis vendor to ask them directly if they support your architecture, framework, language or library.  You will need to provide them with what type of application you are scanning (mobile, web, REST, kernel, eSE, trusted application, etc.), as well as all the libraries, frameworks, services, 3rd party components and back-ends (Oracle, No-SQL[Mongo-DB, Cassandra, Riak, etc.], REST) used by the application.  In some unique cases because of the framework used or the architecture, you will need to write source or entrypoint rules to identify where untrusted data enters the application.  This is usually dependent on the framework or callback methods of the architecture.  For example, there may be a main method where the parameters to the method are passed in from untrusted data sources.  If the code paths utilize 3rd party libraries to filter, enhance or modify the untrusted data before reaching sinks you will usually need to write pass-through rules.  If you know that certain back-ends, libraries or services that you are using are not supported by the static analysis engine then you will need to do a threat analysis of the back-end, library, or service API to determine what sink rules need to be written. In some cases, your rules will need to be flexible.  For example, if you are using a framework which supports “convention over configuration”, you will need to use markers like parameters of a public method in a class which extends a base class to identify entry points for attacker provided data. These custom rule will allow you create the necessary data flows to catch the more serious vulnerabilities like buffer overflow, integer overflow, and other code execution vulnerabilities.  However, there are some cases where the sources of untrusted data are not clearly identified in the code being scanned.

Libraries are one example of code that lacks clear sources of untrusted data.  This is because libraries inherit sources of untrusted data from the web, mobile, and desktop applications that they are used within. Typically, a call path will start at web, mobile, or desktop application input interface then pass through a 3rd party library and finally end up at a sink.  Sources in web applications are different from sources in mobile and desktop applications.  Basically, to find vulnerabilities in libraries you need to scan the source code of the library with the source code of the application that it runs within so that real word sources can be applied to dataflow through the library.  If you are a company that only develops a library, you will need to develop entry point or characterization rules to create artificial sources and entry points through your library to find vulnerabilities.

Rule Types

Each static analysis engine will allow you to write custom rules to enhance its results and meet your unique application, framework, or library.  The following are the most common types of rules that you will create to meet 90% of your needs.


Source rules allow you to define runtime function calls which return untrusted data (data controlled by external entities) or functions with IN_OUT parameter types which return untrusted data via its IN_OUT parameter.  One of the most well known in web applications is request parameters.  In Java Servlets the method used to get request parameters is:

String untrustedData = request.getParameter(“parameterName”);

The getParameter(…) method returns attacker controlled data in the query string of a URL request to the server.

An attacker could type the following into the URL:’%20or%201=1%20--

In CheckMarx, the source rule would look like the following:
Return All.FindByMemberAccess(“ServletRequest.getParameter”)

In Coverity, it would look like the following:
  "taint_kind": "servlet",
  "tainted_data": {
    "return_value_of": {
      "or": [
          // The native getParameter call
          "matching": "^javax\\.servlet\\.ServletRequest\\.getParameter\\("
          // All implementations that override the getParameter
          "overrides": "^javax\\.servlet\\.ServletRequest\\.getParameter\\("

In Fortify, it would look like the following:
<DataflowSourceRule language="java" formatVersion="3.8"> <RuleID>xxxx</RuleID><FunctionIdentifier>
<NamespaceName><Pattern>javax\.servlet</Pattern></NamespaceName> <ClassName><Pattern>ServletRequest</Pattern></ClassName>
<ApplyTo implements="true" overrides="true" extends="true">

Entry Point

An entry point rule is similar to a source rule in that it identifies where untrusted data is coming into the application but instead of identifying a runtime function which returns untrusted data, an entry point rule identifies function definition parameters of methods which should be considered untrusted.

A good example is

public static void main (String args[]) {

This method is the main entry point for command line parameters that can be provided by an attacker when starting your application.

In CheckMarx, the rule would look like the following:

In Coverity, it would look like the following:
  "taint_kind": "servlet",
  "tainted_data": {
    "all_params_of": {
      "and": [
          "with_annotation": {
            "matching": "^org\\.springframework\\.web\\.bind\\.annotation\\.RequestMapping$"
          "in_class": {
            "with_super": {
              "with_annotation": {
                "matching": "^org\\.springframework\\.stereotype\\.Controller$"

In Fortify, it would look like the following:
<DataflowEntryPointRule language="java" formatVersion="3.8"> <RuleID>xxxx</RuleID> <FunctionIdentifier>
<NamespaceName> <Pattern>.*</Pattern></NamespaceName>
<ApplyTo implements="true" overrides="true" extends="true">


A Sink rule represents parameters of a function which, if untrusted data were to reach, would cause a exploitable vulnerability.

A SQL Injection vulnerability sink would be the first argument to


In CheckMarx, the rule would look like the following:

In Coverity, it would look like the following:
  "type" : "Coverity analysis configuration",
  "format_version" : 1,
  "directives" : [
    // Definition of the checker with name, description, remediation
      "dataflow_checker_name" : "DF.FAKE_NOSQL_QUERY_INJECTION",
      "taint_kinds" : [ "network", "servlet", "database", "filesystem", "console", "environment", "system_properties", "rpc" ],
      "languages": {
        "Java" : "Webapp-Security-Preview"
      "covlstr_sink_message" : "{CovLStrv2{{t{A tainted value {0} is evaluated by a NoSQLsink.}{\"\"}}}}",
      "covlstr_remediation_advice" : "{CovLStrv2{{t{Avoid evaluating untrusted data.}}}}"
    // Sinks
      "sink_for_checker": "DF.FAKE_NOSQL_QUERY_INJECTION",
      "sink" : {
        "param_index": 1,
        "methods": {
          "overrides": {
            "matching" : "java\\.sql\\.Statement\\.execute\\(.*" }

In Fortify, it would look like the following:
<DataflowSinkRule language="java" formatVersion="3.8"> <RuleID>xxxx</RuleID><FunctionIdentifier>
<NamespaceName><Pattern>java\.sql</Pattern></NamespaceName> <ClassName><Pattern>Statement</Pattern></ClassName>
<ApplyTo implements="true" overrides="true" extends="true">


A passthrough rule allows the static analysis engine to create a short cut through a 3rd party library while preserving any markers on the dataflow.

In CheckMarx, the rule would look like the following:
Not needed

In Coverity, it would look like the following:
public class String {
  public static String format(String format, Object args...) {
    return unknown();

In Fortify, it would look like the following:
<DataflowPassthroughRule language="java" formatVersion="3.8"> <RuleID>xxxx</RuleID> <FunctionIdentifier>
<NamespaceName> <Pattern>java\.lang</Pattern></NamespaceName>
<ApplyTo implements="true" overrides="true" extends="true">

Dynamic Application of Sink, Source or Entrypoint rule to Matched Code Constructs

A hybrid rule allows you to match on program constructs programmatically and set those matched items as sinks, entrypoints, sources, or passthroughs.

In CheckMarx, the rule would look like the following:
CxList input = All.FindByName("input");
CxList db = All.FindByName("execute");
CxList sanitize = All.FindByName("fix");
return db.InfluencedByAndNotSanitized(input, sanitize);

In Fortify, it would look like the following:
<CharacterizationRule formatVersion="3.8” language="java"><RuleID>zzzz</RuleID>
            Variable p: p.enclosingFunction is
            [Function f: f.parameters contains p] and p.enclosingFunction.annotations contains         [Annotation b: matches             "org\.restlet\.resource\.(Delete|Put|Post|Get)"]]]>
            foreach p {
                        TaintEntrypoint(p, {+WEB +XSS})
]]></Definition></ CharacterizationRule>

Now you are getting pretty “meaty” vulnerabilities from your tool, but it does not end there.  Static analysis cannot find all security vulnerabilities in your code. 

Understand What Static Analysis Cannot Do

In order to do a complete risk analysis of your application you need to consider the following:

·      Manual Code Review
·      Penetration Testing
·      Problems with 3rd Party Libraries

Manual Code Review

There are just some things which are difficult for a machine to understand.  Maybe AI will handle these in 20-40 years but for now you need to handle the following:

·      Authentication
o   Password Handling (Reset, Storage, and Policies)
·      Backdoors
·      Authorization
·      Input Validation
o   Negative values
·      Session Management
·      Proper Cryptography
·      Auditing
·      Business Logic Vulns
·      Information Leakage
·      File Upload/Download Logic
·      Error Handling
·      Languages not covered well by tool or cross language calls (JNI, Unmanaged Code, system calls, etc.)

If you want more details on how to review the bulleted items above please see:

Penetration Testing

Penetration testing provides a great way to verify fixes and look for unresolved vulnerabilities.  Penetration testing gives you a chance to find vulnerabilities outside the application code which result from interactions with other systems and services.  It also is a way to look at the application from an attacker’s point of view.

Penetration testing and source code analysis combined together can do a comprehensive job of assessing risk in applications but there is one other area of concern—3rd party libraries.

Problems with 3rd Party Libraries

3rd party libraries are usually included with your application as dlls, jar files, shared objects, etc.  The problem arises when you scan the application without the library’s source code.  This can cause two problems: 1. The dataflows through the library are blocked  2. The vulnerabilities in the libraries themselves are hidden.

Source (or Entrypoint) + ??? (No sinks) = No Vulnerability Found through    

When dataflows are blocked, the scanner will not be able to link from a source to a sink. 

You can create passthrough rules through the library to connect the Sources and Sinks. 

You can also include the source code of the library itself in the scan.  This will allow you to see the vulnerabilities in the library as well as connect sources to sinks in the application (as depicted below).

In addition, you will also want to run a tool like OWASP Dependency Check.  This is a great free tool which will look at the Java and C# libraries in your application and alert you to known vulnerabilities for the version of your library that you are using.

Scanning a library by itself leads to another set of problems. 

Problems Scanning a Library by Itself

Usually a library inherits its sources and sinks from the application that it runs in.  When a library is scanned in isolation, it doesn’t have sources so the “meaty” vulnerabilities do not usually show up.

Empty Source/Entry points + Sink = No Vulnerability Found

The solution is to write entrypoint rules on all public interfaces to the library.  This allows the static analysis tool to create the necessary dataflows to find the “meaty” vulnerabilities within the library itself.

Entrypoint rules on public interfaces + Sink = Library Vulnerability

Conclusion and Thanks

You should have a deeper understanding of static analysis tools and how to get more value out of your tool.  Using the techniques in this presentation have increased result output by over 10X in certain cases for security issues like buffer overflow.

I would like to give thanks to the following people for helping me with this work and providing rules and insight:

Sung Min Hong (Staff Engineer)
Romain Gaucher (Security Researcher at Coverity)
Alvaro Muñoz (Principle Security Researcher at HP Fortify)
Amit Ashbel (Product Marketing at Checkmarx)
Ohad Rabinovich (Engine Product Manager at Checkmarx)

Abraham Kang Bio

Abraham worked with Wells Fargo for over 4 years as an Information Security Engineer and Security Code Reviewer using static and dynamic analysis tools.  He also worked as a Principal Security Researcher for Fortify developing custom rules for the Fortify Static Code Analysis (SCA) tool.  Currently Abraham works for a major mobile communications company.  He has over 8 years of experience using static analysis tools.

No comments:

Post a Comment