This is a copy of the markdown from the edk2 discussion where it has been posted as a draft:

https://github.com/tianocore/edk2/discussions/3258#discussioncomment-3682099

Please go to that link to see a rendered version of the text.

---

#  Adoption of CodeQL in edk2

This RFC proposes adoption of CodeQL as a static analysis tool used in the TianoCore edk2 project.

## Introduction

CodeQL is open source and free for open-source projects. It is maintained by GitHub and naturally has excellent integration with GitHub projects. CodeQL generates a "database" during the firmware build process that enables queries to run against that database. Many open-source queries are officially supported and comprise the vulnerability analysis performed against the database. These queries are maintained here - https://github.com/github/codeql.

Queries are written in an object-oriented query language called QL. CodeQL provides: 1. A [command-line (CLI) interface](https://codeql.github.com/docs/codeql-cli/#codeql-cli) 2. A [VS Code extension](https://codeql.github.com/docs/codeql-for-visual-studio-code/#codeql-for-visual-studio-code) to help write queries and run queries 3. [GitHub action](https://github.com/github/codeql-action) support for repo integration via [code scanning](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning) 4. In addition to other features described in the [CodeQL overview](https://codeql.github.com/docs/codeql-overview/)

[LGTM](https://lgtm.com/) is often paired with CodeQL so you may read about that in documentation. LGTM was acquired by GitHub a few years ago and all of the functionality has been moved to GitHub code scanning. You can read more about this process on the [GitHub blog post](https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/).

CodeQL is an actively maintained project. Here is a comparison of edk2 commit activity versus CodeQL for reference:

- [CodeQL Commit Activity](https://github.com/github/codeql/graphs/commit-activity) - [edk2 Commit Activity](https://github.com/github/codeql/graphs/commit-activity)

Because CodeQL does maintain a strong open-source presence, the TianoCore community should be able to file [issues](https://github.com/github/codeql/issues) and [pull requests](https://github.com/github/codeql/pulls) into the project.

## CodeQL Usage in edk2

CodeQL provides the capability to debug the actual queries and for our (Tianocore) community to write our own queries and even contribute back to the upstream repo when appropriate. In other cases, we might choose to keep our own queries in a separate TianoCore repo or within a directory in the edk2 code tree.

This is all part of CodeQL Scanning and [this page](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning) has information concerning how to configure CodeQL scanning within a GitHub project such as edk2. Information on the particular topic of running additional custom queries is documented [here](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#running-additional-queries) in that page.

In addition, CodeQL presents the flexibility to:
- Build databases locally
- Retrieve databases from server builds
- Relatively quickly test queries locally against a database for a fast feedback loop
- Suppress false positives
- Customize the files and queries used in the edk2 project and quickly keep this list in sync between the server and local execution

### Dismissing CodeQL Alerts

The following documentation describes how to dismiss alerts:
[Dismissing Alerts](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/managing-code-scanning-alerts-for-your-repository#dismissing--alerts)

> _Note:_ If query has a false positive a GitHub Issue can be submitted in the [CodeQL repo issues page](https://github.com/github/codeql/issues) with the `false-positive` tag to help improve the query.

### CodeQL in Pull Requests

The proposal is to enable CodeQL in a step-by-step fashion. The goal with this approach is to make steady progress enabling CodeQL to become more comprehensive and useful while not impacting day-to-day code contributions.

Throughout the process described in this section, CodeQL Code Scanning will be a mandatory status check for edk2 pull requests.

It is proposed to make no changes to the PR evaluation process already in place that only builds packages with changes in the pull request.

#### Query Target List

The first step is to define a target list of queries for edk2. While CodeQL has the ability run against several languages ([including Python](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#changing-the-languages-that-are-analyzed)), I propose we enable C/C++ first and then return to evaluate Python analysis.

I propose the following as an initial candidate list:

- [cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql) - [cpp/infinite-loop-with-unsatisfiable-exit-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-835/InfiniteLoopWithUnsatisfiableExitCondition.ql) - [cpp/overflow-buffer](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-119/OverflowBuffer.ql) - [cpp/pointer-overflow-check](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PointerOverflow.ql) - [cpp/potential-buffer-overflow](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PotentialBufferOverflow.ql) - [cpp/toctou-race-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-367/TOCTOUFilesystemRace.ql) - [cpp/unclear-array-index-validation](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-129/ImproperArrayIndexValidation.ql) - [cpp/unsafe-strncat](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/SuspiciousCallToStrncat.ql) - [cpp/use-after-free](https://github.com/github/codeql/blob/main/cpp/ql/src/Critical/UseAfterFree.ql) - [cpp/user-controlled-null-termination-tainted](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-170/ImproperNullTerminationTainted.ql) - [cpp/wrong-number-format-arguments](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongNumberOfFormatArguments.ql) - [cpp/wrong-type-format-argument](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongTypeFormatArguments.ql)

CodeQL query files (.ql files) contain metadata about the query. For example, [cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql) states the following about the query:

```plaintext
/**
 * @name Conditionally uninitialized variable
* @description An initialization function is used to initialize a local variable, but the * returned status code is not checked. The variable may be left in an uninitialized * state, and reading the variable may result in undefined behavior.
 * @kind problem
 * @problem.severity warning
 * @security-severity 7.8
 * @id cpp/conditionally-uninitialized-variable
 * @tags security
 *       external/cwe/cwe-457
 */
```

We can automatically include queries against these criteria using "query filters". For example, this could include any `problem` query above a certain `security-severity` level. Or all queries with `security` in `tags`.

That would mean new queries checked into the CodeQL repo could cause unexpected build breaks. Since edk2 favors consistency in CI results, I propose we start with the fixed query set proposed at the top of this section.

> _Note:_ Additional queries can be found here as well - https://lgtm.com/search?q=cpp&t=rules

##### Suggesting New Queries

It is proposed new queries be enabled by sending an RFC to the TianoCore development mailing list (devel@edk2.groups.io) with the query link and justification for adopting the query in edk2. Anyone is welcome to suggest new queries.

#### Enable One Query at a Time

Enabling the candidate list of queries immediately will trigger hundreds of alerts.

Therefore, I recommend we enable one query at a time. The PR to enable each query can be accompanied by the code fixes for the query to pass. If a query is deemed fruitless during enabling testing, it can simply be rejected. The goal is to enable an effective set of queries that improve the codebase. As the list of enabled queries builds, the total CodeQL coverage will increase against active PRs.

I recommend we start with a query that has a very low alert frequency to enable the initial GitHub Action and build momentum for enabling additional queries.

It is proposed that each query be enabled in a "query staging branch" where the query is added to the edk2 query suite and each package contributes their patches to the branch to fix any issues necessary for the query to be successful. Once that branch is ready, it can be converted into a patch series that enables the query for the codebase in a single pull request.

That being said, we can [configure the severities that cause pull request failure](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#defining-the-severities-causing-pull-request-check-failure). If we find a useful informational query, that could be enabled without impacting the pull request status result. As of this RFC, those queries are not proposed.

#### edk2 Pull Request Example

I put together a [PR 3179: Test CodeQL](https://github.com/tianocore/edk2/pull/3179) that demonstrates a basic CodeQL GitHub action.

You can see the CodeQL link in the PR Checks area ([direct link to results](https://github.com/tianocore/edk2/pull/3179/checks?check_run_id=7817354780)):
![image](https://user-images.githubusercontent.com/21320094/187554321-6fb230f8-a870-468f-911e-861b046a1396.png)

You can also see the run on the [edk2/actions](https://github.com/tianocore/edk2/actions) page:
![image](https://user-images.githubusercontent.com/21320094/187554464-651313ea-8fad-4881-9e4c-fba0b102d444.png)

This initial test was mainly to ensure permissions were set up to allow the action to run and to settle other logistics in getting the action (first GitHub action in edk2) setup and working properly.

#### edk2 Pull Request Time

We can roughly expect CodeQL to take an hour to complete against a pull request. Since this would be the longest running task in a pull request in parallel to other jobs, pull requests should be expected to take around an hour to have all status checks finished. This is not considered an issue since:
1. Results can be checked locally
2. The TianoCore contribution process requires patches to be on the mailing list for a minimum of 24 hours
3. Pull requests can be created at any time to check results

CodeQL has the [ability to ignore certain file paths](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#avoiding-unnecessary-scans-of-pull-requests) and I recommend we leverage that to avoid running CodeQL on pull requests that only change the maintainers.txt file for example.

#### CodeQL Scheduled Builds

While pull requests will only build packages deemed necessary based on the PR evaluation heuristics used today, the plan is to have a 24 scheduled build that will build all of the packages and place the results in GitHub Code Scanning.

We will use this process to understand if there any differences discovered. Based on the results, we might choose to adjust our strategy for running CodeQL in pull requests to prevent future discrepancies. If differences are present, it is proposed maintainers of the affected package resolve those issues.

### CodeQL Locally

The [CodeQL CLI](https://codeql.github.com/docs/codeql-cli/) can be used as follows to wrap around the edk2 build process (`MdeModulePkg` in this case) to generate a database in the directory `cpp-database`. Example is shown using [stuart](https://github.com/tianocore/edk2-pytool-extensions) build command.

```cmd
codeql database create cpp-database --language=cpp --command="stuart_ci_build -c .pytool/CISettings.py -p MdeModulePkg -a IA32,X64 TOOL_CHAIN_TAG=VS2019 Target=DEBUG --clean --skip-post-build" --overwrite
```

The following command can be used to generate a [SARIF file](https://codeql.github.com/docs/codeql-cli/sarif-output/) (called `query-results.sarif`) from that database with the results of the [cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql) query:

```cmd
codeql database analyze cpp-database codeql\cpp\ql\src\Security\CWE\CWE-457\ConditionallyUninitializedVariable.ql --format=sarifv2.1.0 --output=query-results.sarif
```

SARIF logs can be read by log viewers such as the [Sarif Viewer](https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer) extension for [VS Code](https://code.visualstudio.com/).

---

Once an edk2 query suite (.qls) file is created, that same file can be used for pull requests and by developers locally using the CodeQL CLI to run the same set of queries with the `analyze` command shown above. Developers can add new queries to the file and confirm results before attempting server builds.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#94259): https://edk2.groups.io/g/devel/message/94259
Mute This Topic: https://groups.io/mt/93881031/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to