Basics tutorial
1. Welcome to the Semgrep Tutorial! ✅
In this tutorial you will learn how to write Semgrep patterns to search your code.
The simplest Semgrep pattern is one that looks exactly like your code.
SEMGREP PATTERN
print('Hello, World!')TEST CODE
print('Hello, World!')
print('Hello,\
World!')
#print('Hello, World!')
x = "print('Hello, World!')"ANSWER + EXPLAIN
Notice how Semgrep finds the statement that is broken across two lines, because it is semantically the same,

but does not match on comments or strings.

2. The ellipsis operator (...)
The ellipsis operator ... in Semgrep allows you to "skip over" things you don't care about like .* does in regular expressions, by abstracting away a sequence of zero or more expressions.
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
Notice how

makes Semgrep recognize lg.info(...) as a call to logging.info(...).
However, if you changed your Semgrep pattern to lg.info(...) Semgrep would no longer find this line of code, since it is just an alias.
3. The ellipsis operator in strings "..." ✅
You can also use the ellipsis operator inside of quotation marks "..." to tell Semgrep to match any constant string.
Try matching the first two uses of logging.info by using the ellipsis operator to match all strings.
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
4. The ellipsis operator: between code
The ellipsis operator can ignore arbitrary sequences of expressions or statements, as well. Semgrep will match if the "ends" of the pattern match, ignoring what is in between them.
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
5. The ellipsis operator: ordered lists
When matching ordered lists of expressions, Semgrep will match only if the position of an expression is the same in the pattern and the code. So, the pattern open("r") will not match open(filename, "r").
Ellipses can be used to match the first or last expression in a list, like open("r", ...) or open(..., "r"), respectively.
To look for an expression at any location, use ellipses on both sides!
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
6. Metavariables
Metavariables are an abstraction that lets you match something even when you don't know exactly what it is you want to match. Think of metavariables like capture groups in regular expressions.
All metavariables must start with a dollar sign and can only contain uppercase characters, digits, and underscores.
The pattern here uses a metavariable, $FUNC, to match any function name. As written, this pattern will match every function definition.
Try to match only function definitions that end with calls to requests.get() or requests.post(). You can use a metavariable named $METHOD to capture any method of the requests object.
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
How does Semgrep know what to interpret as a metavariable and what is just a normal name, like requests?
Remember: metavariables start with a dollar sign and can only contain uppercase characters, digits, and underscores.
🟢 $X
🟢 $WIDGET
🟢 $USERS_2
🔴 $x
🔴 $some+value
7. A Common Pitfall
Beware: Semgrep can't match partial statements. Semgrep patterns must match an entire, parsable statement in the target language. If you click Run, Semgrep will fail to parse the following pattern
def get_user():
because it is missing a function body.
Try fixing this pattern by making it match any definition of get_user with any function body
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
Pro tip: If you are ever having trouble getting Semgrep to parse your Semgrep pattern:
1️⃣ Copy and paste the code you want to match into your Semgrep pattern as a starting place
2️⃣ Generalize the pattern by replacing unimportant lines with ellipses, and adding ellipses where extra code could be placed
3️⃣ Replace the key variables with metavariables
8. Matching using metavariables
Metavariables can also be used like normal variables in a program.
Say you want to search for places where a file was opened in read mode, but then written to. In Python, this will cause a runtime error.
Use a metavariable, $FD, on the left-hand side of an assignment, and then re-use that metavariable later as if it were the file descriptor.
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
Of course, you could have matched this specific example by using the variable fout instead of the metavariable $FD.
However, if during a refactor, someone changed the name of this variable from fout to fd or anything else, the pattern would no longer match, although the code would still have a bug 🐛.
That is the power of metavariables. 💪
9. Metavariables vs. Ellipses
Metavariables differ from the ellipsis operator because they must match an expression, whereas the ellipsis operator can match zero or more expressions.
As an example, see this pattern that specifies two metavariables. It says, "find all calls to open() with two arguments, but I don't know what they are."
Click Run and examine the matches. Line 5 in the test code did not match because it has more than two arguments.
Try to match all calls to open with at least two arguments. How might you say "I don't care about what is after the first two arguments?"
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
Using a metavariable tells Semgrep, "something is here, but I don't know what it is." Using an ellipsis tells Semgrep, "I don't care what is between here and there."
How could you change this to match only on calls that specify buffering?
🤔 We will get to keyword arguments next!
10. BONUS: match keyword arguments in any order
Semgrep understands the semantics of the language it is scanning. As an example, Semgrep matches keyword arguments in any order in Python.
Write a Semgrep pattern to match on calls to requests.get which specify any string as a url (positional argument), any timeout value, and that verify is true. Use $SEC as your time metavariable.
SEMGREP PATTERN
TEST CODE
ANSWER + EXPLAIN
Now that you know all about ellipses and metavariables, you can write some powerful patterns to find bugs and vulnerabilities.
🧙🏾🧙♀️🧙🏽♀️🧙♂️ However, often one pattern is not enough to specify exactly the situations you do and do not want Semgrep to find.
Last updated