In the Basics Tutorial, your learned about Semgrep patterns, which are the basic building blocks of Semgrep rules.
Semgrep rules are written in YAML, and contain one or more patterns, along with some other metadata like a message to display whenever the rule finds a match. This YAML syntax enables us to combine patterns with logic operators.
If you want to match eitherpattern1 OR pattern2, use pattern-either.
In the example provided, two variations of == can be matched using pattern-either.
Complete the second pattern, where the string literal is on the left and metavariable is on the right.
SEMGREP RULE
rules:-id:use-string-equalsmessage:In Java, do not use == with strings. Use String.equals() instead.pattern-either:-pattern:if ($X == "...") ...-pattern:TODO
TEST CODE
public class Example { public intfoo(String a,int b){if(a =="hello")return1;// Match here too by adding another pattern clause.if("hello"== a)return2;// Do not match hereprint("hello")}}
ANSWER + EXPLAIN
In fact, you are not limited to two patterns under the pattern-either. Many of the rules on the Rules page use 5 or more patterns that are all OR'd together!
2. Pattern is not
You can use pattern-not to filter out patterns you do not want to match.
The pattern-not is listed as another item under patterns, since you want to find code that matches the first pattern AND does not match the second.
In the example here, filter out patterns where the first argument of subprocess.call() is a string.
SEMGREP RULE
TEST CODE
ANSWER + EXPLAIN
As with pattern-either, you are not limited to just one pattern-not.
In fact, you can even combine them to write code that means "code is either this OR that OR that, AND is not this"
There are many examples of this in the pre-written rules on the Rules tab. But first, stay here to learn about pattern-inside and metavariable-regex!
3. Pattern is inside
As the name implies, pattern-inside lets you search for patterns inside the pattern specified by pattern-inside.
A few useful scenarios include: searching inside function definitions, searching before or after function calls, and verifying that a certain module has been imported.
Note that the pipe (|) after pattern-inside is YAML syntax that permits multi-line strings.
Try to match http.ResponseWriter.Write() in the code provided. Your starting rule will also match bytes.Buffer.Write(). Use pattern-inside to only search inside functions which have a parameter of type http.ResponseWriter (this challenge is in Go).
SEMGREP RULE
TEST CODE
ANSWER + EXPLAIN
4. Pattern is not inside
A pattern-not-insidefilters out any matches inside the range defined by the pattern.
A common use is to filter out matches that are called after a certain function. For example, consider detecting cookies in Java that do not have the secure flag set.
In Java, setting the secure flag is accomplished by instantiating a Cookie object, calling setSecure(true) on it, and finally adding it to the response.
Use pattern-not-inside to filter out cases where setSecure(true) has been called. Your pattern-not-inside pattern must use ellipses in order to capture everything that happens AFTER setSecure(true) is called. Try using the YAML multi-line | symbol and remember your semicolons.
SEMGREP RULE
TEST CODE
ANSWER + EXPLAIN
Our final challenge will show you how to specify a metavariable that conforms to a particular regular expression.
5. Metavariable Regex
One final Semgrep pattern type that is very useful is called metavariable-regex.
It allows you to specify that certain metavariables only match variables whose names fit a specified regular expression.
The example provided will match calls to django.db.models.FloatField(...)only if the left-hand side of the equation contains the word "fee" or "salary".
Change the example to also match when the return value contains the word "price".
rules:
- id: use-string-equals
message: In Java, do not use == with strings. Use String.equals() instead.
pattern-either:
- pattern: if ($X == "...") ...
- pattern: if ("..." == $X) ...
import subprocess
subprocess.call("ls -a .") # Try not to match here.
dir = "/tmp"
subprocess.call("ls -a " + dir) # or here
subprocess.call(dir, shell=True) # or here!
subprocess.call(nonstring) # MATCH THIS
subprocess.call(nonstring, shell=True) # and this!
rules:
- id: subprocess-call
patterns:
- pattern: subprocess.call(...)
# This says never match if first argument is a string
- pattern-not: subprocess.call("...", ...)
rules:
- id: use-decimalfield-for-money
patterns:
- pattern-inside: |
class $M(...):
...
- pattern: $F = django.db.models.FloatField(...)
- metavariable-regex:
metavariable: '$F'
regex: '.*(fee|salary).*'
message: Found a FloatField used for variable $F. Use DecimalField for currency fields to avoid float-rounding errors.
languages: [python]
severity: ERROR
from django.db import models
class Product(models.Model):
name = models.CharField(max_length=64)
description = models.TextField()
# this is ok
price = models.DecimalField(max_digits=6, decimal_places=2)
# this is also ok
return_rate = models.FloatField()
# Semgrep finds this because old_fee ends in the word fee, which is in the regex above
old_fee = models.FloatField()
# match this
price_inc = models.FloatField()
rules:
- id: use-decimalfield-for-money
patterns:
- pattern-inside: |
class $M(...):
...
- pattern: $F = django.db.models.FloatField(...)
- metavariable-regex:
metavariable: '$F'
regex: '.*(fee|salary|price).*'
message: Found a FloatField used for variable $F. Use DecimalField for currency fields to avoid float-rounding errors.
languages: [python]
severity: ERROR