Pattern composition

1. Either / Or

In the Basics Tutorial, your learned about Semgrep patterns, which are the basic building blocks of Semgrep rules.

Semgrep rules are written in YAMLarrow-up-right, and contain one or more patterns, along with some other metadata like a message to display whenever the rule finds a match. This YAML syntax enables us to combine patterns with logic operators.

If you want to match either pattern1 OR pattern2, use pattern-either.

In the example provided, two variations of == can be matched using pattern-either.

Complete the second pattern, where the string literal is on the left and metavariable is on the right.

SEMGREP RULE

rules:
  - id: use-string-equals
    message: In Java, do not use == with strings. Use String.equals() instead.
    pattern-either:
      - pattern: if ($X == "...") ...
      - pattern: TODO

TEST CODE

public class Example {
  public int foo(String a, int b) {
      if (a == "hello") return 1;
      // Match here too by adding another pattern clause.
      if ("hello" == a) return 2;
      // Do not match here
      print("hello")
  }
}

ANSWER + EXPLAIN

In fact, you are not limited to two patterns under the pattern-either. Many of the rules on the Rules page use 5 or more patterns that are all OR'd together!

2. Pattern is not

You can use pattern-not to filter out patterns you do not want to match.

The pattern-not is listed as another item under patterns, since you want to find code that matches the first pattern AND does not match the second.

In the example here, filter out patterns where the first argument of subprocess.call() is a string.

SEMGREP RULE

TEST CODE

ANSWER + EXPLAIN

As with pattern-either, you are not limited to just one pattern-not.

In fact, you can even combine them to write code that means "code is either this OR that OR that, AND is not this"

There are many examples of this in the pre-written rules on the Rules tab. But first, stay here to learn about pattern-inside and metavariable-regex!

3. Pattern is inside

As the name implies, pattern-inside lets you search for patterns inside the pattern specified by pattern-inside.

A few useful scenarios include: searching inside function definitions, searching before or after function calls, and verifying that a certain module has been imported.

Note that the pipe (|) after pattern-inside is YAML syntax that permits multi-line strings.

Try to match http.ResponseWriter.Write() in the code provided. Your starting rule will also match bytes.Buffer.Write(). Use pattern-inside to only search inside functions which have a parameter of type http.ResponseWriter (this challenge is in Go).

SEMGREP RULE

TEST CODE

ANSWER + EXPLAIN

4. Pattern is not inside

A pattern-not-inside filters out any matches inside the range defined by the pattern.

A common use is to filter out matches that are called after a certain function. For example, consider detecting cookies in Java that do not have the secure flag set.

In Java, setting the secure flag is accomplished by instantiating a Cookie object, calling setSecure(true) on it, and finally adding it to the response.

Use pattern-not-inside to filter out cases where setSecure(true) has been called. Your pattern-not-inside pattern must use ellipses in order to capture everything that happens AFTER setSecure(true) is called. Try using the YAML multi-line | symbol and remember your semicolons.

SEMGREP RULE

TEST CODE

ANSWER + EXPLAIN

Our final challenge will show you how to specify a metavariable that conforms to a particular regular expression.

5. Metavariable Regex

One final Semgrep pattern type that is very useful is called metavariable-regex.

It allows you to specify that certain metavariables only match variables whose names fit a specified regular expression.

The example provided will match calls to django.db.models.FloatField(...) only if the left-hand side of the equation contains the word "fee" or "salary".

Change the example to also match when the return value contains the word "price".

SEMGREP RULE

TEST CODE

ANSWER + EXPLAIN