Using semgrep¶
While taking a look at the changes required to implement a solution for the Add support to SAST for the --disable-nosem option issue, I noticed that --disable-nosem and --sarif seemed to interact in a curious way.
What I observed¶
When I passed --disable-nosem to the semgrep executable, the results seemed the same as if I had not passed --disable-nosem.
I ultimately attributed this unexpected behavior to the upstream semgrep with "--sarif" "# nosemgrep" comments are ignored for python issue.
Do my changes work as expected if I test with a project that does not use Python?
I looked for more issues about nosemgrep and --sarif not quite working properly and I identified the nosemgrep comments in TypeScript seem to be ignored issue which led me to what I believe is a 🔑 key piece of information in the Include ignored findings in SARIF output using suppression syntax issue.
- When using
--sarif(to get the output in SARIF format), findings that are to be ignored withnosemgrepare included in the output with a note that they have been suppressed.
What effect should --disable-nosem have when --sarif is used?
Let's test with a simple project where we know nosem works first. Let's make sure to use a language where GitLab SAST is using semgrep as the Analyzer. It loks like JavaScript is a good example.
semgrep scan --sarif --output semgrep-sarif.json
/usr/local/bin/semgrep -f /rules -o semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --exclude spec --exclude test --exclude tests --exclude tmp --metrics on --max-memory 0 --verbose
With the items in qa/fixtures/js/default (remote: git@gitlab.com:gitlab-org/security-products/analyzers/semgrep.git), there are 3 findings.
Parsing the .sarif file with jq '.runs[0].results' semgrep.sarif lets me see the results. Let's mark the one on line 16 with nosem and see what the .sarif report says after that.
found 'nosem' comment, skipping rule 'eslint.detect-non-literal-regexp' on line 16
but also:
Ran 11 rules on 1 file: 3 findings.
- ❓ Does it make sense that there are 3 findings when we know that one of them is to be ignored?
Let's see if the .sarif file says anything special about the finding on line 16.
Yes:
"properties": {},
"ruleId": "eslint.detect-non-literal-regexp",
"suppressions": [
{
"kind": "inSource"
}
]
},
Then let's see what happens when we pass --disable-nosem:
semgrep scan --disable-nosem --sarif --output semgrep-sarif.json
/usr/local/bin/semgrep -f /rules -o semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --exclude spec --exclude test --exclude tests --exclude tmp --metrics on --max-memory 0 --disable-nosem --verbose
We still see that there are 3 findings. There is still an inSource suppression for the item on line 16.
In other words: it seems like the --disable-nosem flag does not change the Scan Summary or the content of the suppressions in the SARIF report.
Let's permit nosem and lose the --sarif output.
Now, only 2 findings are reported.
Testing¶
docker run --rm --mount type=bind,source="$(pwd)",target=/tmp/app -it registry.gitlab.com/bcarranza/catssharetheirrules:disablenosemv0 /bin/sh
What I confirmed¶
- By default,
--enable-nosemis enabled.