[edk2-devel] [RFC] Adoption of CodeQL in edk2

Michael Kubacki Fri, 23 Sep 2022 15:18:26 -0700

This is a copy of the markdown from the edk2 discussion where it hasbeen posted as a draft:


https://github.com/tianocore/edk2/discussions/3258#discussioncomment-3682099


Please go to that link to see a rendered version of the text.

---

#  Adoption of CodeQL in edk2

This RFC proposes adoption of CodeQL as a static analysis tool used inthe TianoCore edk2 project.


## Introduction

CodeQL is open source and free for open-source projects. It ismaintained by GitHub and naturally has excellent integration with GitHubprojects. CodeQL generates a "database" during the firmware buildprocess that enables queries to run against that database. Manyopen-source queries are officially supported and comprise thevulnerability analysis performed against the database. These queries aremaintained here - https://github.com/github/codeql.

Queries are written in an object-oriented query language called QL.CodeQL provides:1. A [command-line (CLI)interface](https://codeql.github.com/docs/codeql-cli/#codeql-cli)2. A [VS Codeextension](https://codeql.github.com/docs/codeql-for-visual-studio-code/#codeql-for-visual-studio-code)to help write queries and run queries3. [GitHub action](https://github.com/github/codeql-action) support forrepo integration via [codescanning](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning)4. In addition to other features described in the [CodeQLoverview](https://codeql.github.com/docs/codeql-overview/)

[LGTM](https://lgtm.com/) is often paired with CodeQL so you may readabout that in documentation. LGTM was acquired by GitHub a few years agoand all of the functionality has been moved to GitHub code scanning. Youcan read more about this process on the [GitHub blogpost](https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/).

CodeQL is an actively maintained project. Here is a comparison of edk2commit activity versus CodeQL for reference:

- [CodeQL CommitActivity](https://github.com/github/codeql/graphs/commit-activity)- [edk2 CommitActivity](https://github.com/github/codeql/graphs/commit-activity)

Because CodeQL does maintain a strong open-source presence, theTianoCore community should be able to file[issues](https://github.com/github/codeql/issues) and [pullrequests](https://github.com/github/codeql/pulls) into the project.


## CodeQL Usage in edk2

CodeQL provides the capability to debug the actual queries and for our(Tianocore) community to write our own queries and even contribute backto the upstream repo when appropriate. In other cases, we might chooseto keep our own queries in a separate TianoCore repo or within adirectory in the edk2 code tree.

This is all part of CodeQL Scanning and [thispage](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning)has information concerning how to configure CodeQL scanning within aGitHub project such as edk2. Information on the particular topic ofrunning additional custom queries is documented[here](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#running-additional-queries)in that page.


In addition, CodeQL presents the flexibility to:
- Build databases locally
- Retrieve databases from server builds

- Relatively quickly test queries locally against a database for a fastfeedback loop

- Suppress false positives

- Customize the files and queries used in the edk2 project and quicklykeep this list in sync between the server and local execution


### Dismissing CodeQL Alerts

The following documentation describes how to dismiss alerts:

[DismissingAlerts](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/managing-code-scanning-alerts-for-your-repository#dismissing--alerts)

> _Note:_ If query has a false positive a GitHub Issue can be submittedin the [CodeQL repo issuespage](https://github.com/github/codeql/issues) with the `false-positive`tag to help improve the query.


### CodeQL in Pull Requests

The proposal is to enable CodeQL in a step-by-step fashion. The goalwith this approach is to make steady progress enabling CodeQL to becomemore comprehensive and useful while not impacting day-to-day codecontributions.

Throughout the process described in this section, CodeQL Code Scanningwill be a mandatory status check for edk2 pull requests.

It is proposed to make no changes to the PR evaluation process alreadyin place that only builds packages with changes in the pull request.


#### Query Target List

The first step is to define a target list of queries for edk2. WhileCodeQL has the ability run against several languages ([includingPython](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#changing-the-languages-that-are-analyzed)),I propose we enable C/C++ first and then return to evaluate Python analysis.


I propose the following as an initial candidate list:

-[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql)-[cpp/infinite-loop-with-unsatisfiable-exit-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-835/InfiniteLoopWithUnsatisfiableExitCondition.ql)-[cpp/overflow-buffer](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-119/OverflowBuffer.ql)-[cpp/pointer-overflow-check](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PointerOverflow.ql)-[cpp/potential-buffer-overflow](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PotentialBufferOverflow.ql)-[cpp/toctou-race-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-367/TOCTOUFilesystemRace.ql)-[cpp/unclear-array-index-validation](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-129/ImproperArrayIndexValidation.ql)-[cpp/unsafe-strncat](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/SuspiciousCallToStrncat.ql)-[cpp/use-after-free](https://github.com/github/codeql/blob/main/cpp/ql/src/Critical/UseAfterFree.ql)-[cpp/user-controlled-null-termination-tainted](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-170/ImproperNullTerminationTainted.ql)-[cpp/wrong-number-format-arguments](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongNumberOfFormatArguments.ql)-[cpp/wrong-type-format-argument](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongTypeFormatArguments.ql)

CodeQL query files (.ql files) contain metadata about the query. Forexample,[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql)states the following about the query:


```plaintext
/**
 * @name Conditionally uninitialized variable

* @description An initialization function is used to initialize alocal variable, but the* returned status code is not checked. The variable maybe left in an uninitialized* state, and reading the variable may result in undefinedbehavior.

 * @kind problem
 * @problem.severity warning
 * @security-severity 7.8
 * @id cpp/conditionally-uninitialized-variable
 * @tags security
 *       external/cwe/cwe-457
 */
```

We can automatically include queries against these criteria using "queryfilters". For example, this could include any `problem` query above acertain `security-severity` level. Or all queries with `security` in `tags`.

That would mean new queries checked into the CodeQL repo could causeunexpected build breaks. Since edk2 favors consistency in CI results, Ipropose we start with the fixed query set proposed at the top of thissection.

> _Note:_ Additional queries can be found here as well -https://lgtm.com/search?q=cpp&t=rules


##### Suggesting New Queries

It is proposed new queries be enabled by sending an RFC to the TianoCoredevelopment mailing list (devel@edk2.groups.io) with the query link andjustification for adopting the query in edk2. Anyone is welcome tosuggest new queries.


#### Enable One Query at a Time

Enabling the candidate list of queries immediately will trigger hundredsof alerts.

Therefore, I recommend we enable one query at a time. The PR to enableeach query can be accompanied by the code fixes for the query to pass.If a query is deemed fruitless during enabling testing, it can simply berejected. The goal is to enable an effective set of queries that improvethe codebase. As the list of enabled queries builds, the total CodeQLcoverage will increase against active PRs.

I recommend we start with a query that has a very low alert frequency toenable the initial GitHub Action and build momentum for enablingadditional queries.

It is proposed that each query be enabled in a "query staging branch"where the query is added to the edk2 query suite and each packagecontributes their patches to the branch to fix any issues necessary forthe query to be successful. Once that branch is ready, it can beconverted into a patch series that enables the query for the codebase ina single pull request.

That being said, we can [configure the severities that cause pullrequestfailure](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#defining-the-severities-causing-pull-request-check-failure).If we find a useful informational query, that could be enabled withoutimpacting the pull request status result. As of this RFC, those queriesare not proposed.


#### edk2 Pull Request Example

I put together a [PR 3179: TestCodeQL](https://github.com/tianocore/edk2/pull/3179) that demonstrates abasic CodeQL GitHub action.

You can see the CodeQL link in the PR Checks area ([direct link toresults](https://github.com/tianocore/edk2/pull/3179/checks?check_run_id=7817354780)):

![image](https://user-images.githubusercontent.com/21320094/187554321-6fb230f8-a870-468f-911e-861b046a1396.png)

You can also see the run on the[edk2/actions](https://github.com/tianocore/edk2/actions) page:

![image](https://user-images.githubusercontent.com/21320094/187554464-651313ea-8fad-4881-9e4c-fba0b102d444.png)

This initial test was mainly to ensure permissions were set up to allowthe action to run and to settle other logistics in getting the action(first GitHub action in edk2) setup and working properly.


#### edk2 Pull Request Time

We can roughly expect CodeQL to take an hour to complete against a pullrequest. Since this would be the longest running task in a pull requestin parallel to other jobs, pull requests should be expected to takearound an hour to have all status checks finished. This is notconsidered an issue since:

1. Results can be checked locally

2. The TianoCore contribution process requires patches to be on themailing list for a minimum of 24 hours

3. Pull requests can be created at any time to check results

CodeQL has the [ability to ignore certain filepaths](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#avoiding-unnecessary-scans-of-pull-requests)and I recommend we leverage that to avoid running CodeQL on pullrequests that only change the maintainers.txt file for example.


#### CodeQL Scheduled Builds

While pull requests will only build packages deemed necessary based onthe PR evaluation heuristics used today, the plan is to have a 24scheduled build that will build all of the packages and place theresults in GitHub Code Scanning.

We will use this process to understand if there any differencesdiscovered. Based on the results, we might choose to adjust our strategyfor running CodeQL in pull requests to prevent future discrepancies. Ifdifferences are present, it is proposed maintainers of the affectedpackage resolve those issues.


### CodeQL Locally

The [CodeQL CLI](https://codeql.github.com/docs/codeql-cli/) can be usedas follows to wrap around the edk2 build process (`MdeModulePkg` in thiscase) to generate a database in the directory `cpp-database`. Example isshown using[stuart](https://github.com/tianocore/edk2-pytool-extensions) build command.


```cmd

codeql database create cpp-database --language=cpp--command="stuart_ci_build -c .pytool/CISettings.py -p MdeModulePkg -aIA32,X64 TOOL_CHAIN_TAG=VS2019 Target=DEBUG --clean --skip-post-build"--overwrite

```

The following command can be used to generate a [SARIFfile](https://codeql.github.com/docs/codeql-cli/sarif-output/) (called`query-results.sarif`) from that database with the results of the[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql)query:


```cmd

codeql database analyze cpp-databasecodeql\cpp\ql\src\Security\CWE\CWE-457\ConditionallyUninitializedVariable.ql--format=sarifv2.1.0 --output=query-results.sarif

```

SARIF logs can be read by log viewers such as the [SarifViewer](https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer)extension for [VS Code](https://code.visualstudio.com/).

---

Once an edk2 query suite (.qls) file is created, that same file can beused for pull requests and by developers locally using the CodeQL CLI torun the same set of queries with the `analyze` command shown above.Developers can add new queries to the file and confirm results beforeattempting server builds.



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#94259): https://edk2.groups.io/g/devel/message/94259
Mute This Topic: https://groups.io/mt/93881031/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

[edk2-devel] [RFC] Adoption of CodeQL in edk2

Reply via email to