Skip to content

Using semgrep

While taking a look at the changes required to implement a solution for the Add support to SAST for the --disable-nosem option issue, I noticed that --disable-nosem and --sarif seemed to interact in a curious way.

What I observed

When I passed --disable-nosem to the semgrep executable, the results seemed the same as if I had not passed --disable-nosem.

I ultimately attributed this unexpected behavior to the upstream semgrep with "--sarif" "# nosemgrep" comments are ignored for python issue.

Do my changes work as expected if I test with a project that does not use Python?

I looked for more issues about nosemgrep and --sarif not quite working properly and I identified the nosemgrep comments in TypeScript seem to be ignored issue which led me to what I believe is a 🔑 key piece of information in the Include ignored findings in SARIF output using suppression syntax issue.

  • When using --sarif (to get the output in SARIF format), findings that are to be ignored with nosemgrep are included in the output with a note that they have been suppressed.

What effect should --disable-nosem have when --sarif is used?

Let's test with a simple project where we know nosem works first. Let's make sure to use a language where GitLab SAST is using semgrep as the Analyzer. It loks like JavaScript is a good example.

semgrep scan --sarif --output semgrep-sarif.json
/usr/local/bin/semgrep -f /rules -o semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --exclude spec --exclude test --exclude tests --exclude tmp --metrics on --max-memory 0  --verbose

With the items in qa/fixtures/js/default (remote: git@gitlab.com:gitlab-org/security-products/analyzers/semgrep.git), there are 3 findings.

Parsing the .sarif file with jq '.runs[0].results' semgrep.sarif lets me see the results. Let's mark the one on line 16 with nosem and see what the .sarif report says after that.

found 'nosem' comment, skipping rule 'eslint.detect-non-literal-regexp' on line 16

but also:

Ran 11 rules on 1 file: 3 findings.
  • ❓ Does it make sense that there are 3 findings when we know that one of them is to be ignored?

Let's see if the .sarif file says anything special about the finding on line 16.

Yes:

    "properties": {},
    "ruleId": "eslint.detect-non-literal-regexp",
    "suppressions": [
      {
        "kind": "inSource"
      }
    ]
  },

Then let's see what happens when we pass --disable-nosem:

semgrep scan --disable-nosem --sarif --output semgrep-sarif.json
/usr/local/bin/semgrep -f /rules -o semgrep.sarif --sarif --no-rewrite-rule-ids --strict --disable-version-check --no-git-ignore --exclude spec --exclude test --exclude tests --exclude tmp --metrics on --max-memory 0 --disable-nosem --verbose

We still see that there are 3 findings. There is still an inSource suppression for the item on line 16.

In other words: it seems like the --disable-nosem flag does not change the Scan Summary or the content of the suppressions in the SARIF report.

Let's permit nosem and lose the --sarif output.

Now, only 2 findings are reported.

Test Files: code | ruleset


Testing

docker run --rm --mount type=bind,source="$(pwd)",target=/tmp/app  -it registry.gitlab.com/bcarranza/catssharetheirrules:disablenosemv0 /bin/sh

What I confirmed

  • By default, --enable-nosem is enabled.

📚 Useful Resources