This is a copy of the markdown from the edk2 discussion where it has
been posted as a draft:
https://github.com/tianocore/edk2/discussions/3258#discussioncomment-3682099
Please go to that link to see a rendered version of the text.
---
# Adoption of CodeQL in edk2
This RFC proposes adoption of CodeQL as a static analysis tool used in
the TianoCore edk2 project.
## Introduction
CodeQL is open source and free for open-source projects. It is
maintained by GitHub and naturally has excellent integration with GitHub
projects. CodeQL generates a "database" during the firmware build
process that enables queries to run against that database. Many
open-source queries are officially supported and comprise the
vulnerability analysis performed against the database. These queries are
maintained here - https://github.com/github/codeql.
Queries are written in an object-oriented query language called QL.
CodeQL provides:
1. A [command-line (CLI)
interface](https://codeql.github.com/docs/codeql-cli/#codeql-cli)
2. A [VS Code
extension](https://codeql.github.com/docs/codeql-for-visual-studio-code/#codeql-for-visual-studio-code)
to help write queries and run queries
3. [GitHub action](https://github.com/github/codeql-action) support for
repo integration via [code
scanning](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning)
4. In addition to other features described in the [CodeQL
overview](https://codeql.github.com/docs/codeql-overview/)
[LGTM](https://lgtm.com/) is often paired with CodeQL so you may read
about that in documentation. LGTM was acquired by GitHub a few years ago
and all of the functionality has been moved to GitHub code scanning. You
can read more about this process on the [GitHub blog
post](https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/).
CodeQL is an actively maintained project. Here is a comparison of edk2
commit activity versus CodeQL for reference:
- [CodeQL Commit
Activity](https://github.com/github/codeql/graphs/commit-activity)
- [edk2 Commit
Activity](https://github.com/github/codeql/graphs/commit-activity)
Because CodeQL does maintain a strong open-source presence, the
TianoCore community should be able to file
[issues](https://github.com/github/codeql/issues) and [pull
requests](https://github.com/github/codeql/pulls) into the project.
## CodeQL Usage in edk2
CodeQL provides the capability to debug the actual queries and for our
(Tianocore) community to write our own queries and even contribute back
to the upstream repo when appropriate. In other cases, we might choose
to keep our own queries in a separate TianoCore repo or within a
directory in the edk2 code tree.
This is all part of CodeQL Scanning and [this
page](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning)
has information concerning how to configure CodeQL scanning within a
GitHub project such as edk2. Information on the particular topic of
running additional custom queries is documented
[here](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#running-additional-queries)
in that page.
In addition, CodeQL presents the flexibility to:
- Build databases locally
- Retrieve databases from server builds
- Relatively quickly test queries locally against a database for a fast
feedback loop
- Suppress false positives
- Customize the files and queries used in the edk2 project and quickly
keep this list in sync between the server and local execution
### Dismissing CodeQL Alerts
The following documentation describes how to dismiss alerts:
[Dismissing
Alerts](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/managing-code-scanning-alerts-for-your-repository#dismissing--alerts)
> _Note:_ If query has a false positive a GitHub Issue can be submitted
in the [CodeQL repo issues
page](https://github.com/github/codeql/issues) with the `false-positive`
tag to help improve the query.
### CodeQL in Pull Requests
The proposal is to enable CodeQL in a step-by-step fashion. The goal
with this approach is to make steady progress enabling CodeQL to become
more comprehensive and useful while not impacting day-to-day code
contributions.
Throughout the process described in this section, CodeQL Code Scanning
will be a mandatory status check for edk2 pull requests.
It is proposed to make no changes to the PR evaluation process already
in place that only builds packages with changes in the pull request.
#### Query Target List
The first step is to define a target list of queries for edk2. While
CodeQL has the ability run against several languages ([including
Python](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#changing-the-languages-that-are-analyzed)),
I propose we enable C/C++ first and then return to evaluate Python analysis.
I propose the following as an initial candidate list:
-
[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql)
-
[cpp/infinite-loop-with-unsatisfiable-exit-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-835/InfiniteLoopWithUnsatisfiableExitCondition.ql)
-
[cpp/overflow-buffer](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-119/OverflowBuffer.ql)
-
[cpp/pointer-overflow-check](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PointerOverflow.ql)
-
[cpp/potential-buffer-overflow](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PotentialBufferOverflow.ql)
-
[cpp/toctou-race-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-367/TOCTOUFilesystemRace.ql)
-
[cpp/unclear-array-index-validation](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-129/ImproperArrayIndexValidation.ql)
-
[cpp/unsafe-strncat](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/SuspiciousCallToStrncat.ql)
-
[cpp/use-after-free](https://github.com/github/codeql/blob/main/cpp/ql/src/Critical/UseAfterFree.ql)
-
[cpp/user-controlled-null-termination-tainted](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-170/ImproperNullTerminationTainted.ql)
-
[cpp/wrong-number-format-arguments](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongNumberOfFormatArguments.ql)
-
[cpp/wrong-type-format-argument](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongTypeFormatArguments.ql)
CodeQL query files (.ql files) contain metadata about the query. For
example,
[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql)
states the following about the query:
```plaintext
/**
* @name Conditionally uninitialized variable
* @description An initialization function is used to initialize a
local variable, but the
* returned status code is not checked. The variable may
be left in an uninitialized
* state, and reading the variable may result in undefined
behavior.
* @kind problem
* @problem.severity warning
* @security-severity 7.8
* @id cpp/conditionally-uninitialized-variable
* @tags security
* external/cwe/cwe-457
*/
```
We can automatically include queries against these criteria using "query
filters". For example, this could include any `problem` query above a
certain `security-severity` level. Or all queries with `security` in `tags`.
That would mean new queries checked into the CodeQL repo could cause
unexpected build breaks. Since edk2 favors consistency in CI results, I
propose we start with the fixed query set proposed at the top of this
section.
> _Note:_ Additional queries can be found here as well -
https://lgtm.com/search?q=cpp&t=rules
##### Suggesting New Queries
It is proposed new queries be enabled by sending an RFC to the TianoCore
development mailing list (devel@edk2.groups.io) with the query link and
justification for adopting the query in edk2. Anyone is welcome to
suggest new queries.
#### Enable One Query at a Time
Enabling the candidate list of queries immediately will trigger hundreds
of alerts.
Therefore, I recommend we enable one query at a time. The PR to enable
each query can be accompanied by the code fixes for the query to pass.
If a query is deemed fruitless during enabling testing, it can simply be
rejected. The goal is to enable an effective set of queries that improve
the codebase. As the list of enabled queries builds, the total CodeQL
coverage will increase against active PRs.
I recommend we start with a query that has a very low alert frequency to
enable the initial GitHub Action and build momentum for enabling
additional queries.
It is proposed that each query be enabled in a "query staging branch"
where the query is added to the edk2 query suite and each package
contributes their patches to the branch to fix any issues necessary for
the query to be successful. Once that branch is ready, it can be
converted into a patch series that enables the query for the codebase in
a single pull request.
That being said, we can [configure the severities that cause pull
request
failure](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#defining-the-severities-causing-pull-request-check-failure).
If we find a useful informational query, that could be enabled without
impacting the pull request status result. As of this RFC, those queries
are not proposed.
#### edk2 Pull Request Example
I put together a [PR 3179: Test
CodeQL](https://github.com/tianocore/edk2/pull/3179) that demonstrates a
basic CodeQL GitHub action.
You can see the CodeQL link in the PR Checks area ([direct link to
results](https://github.com/tianocore/edk2/pull/3179/checks?check_run_id=7817354780)):

You can also see the run on the
[edk2/actions](https://github.com/tianocore/edk2/actions) page:

This initial test was mainly to ensure permissions were set up to allow
the action to run and to settle other logistics in getting the action
(first GitHub action in edk2) setup and working properly.
#### edk2 Pull Request Time
We can roughly expect CodeQL to take an hour to complete against a pull
request. Since this would be the longest running task in a pull request
in parallel to other jobs, pull requests should be expected to take
around an hour to have all status checks finished. This is not
considered an issue since:
1. Results can be checked locally
2. The TianoCore contribution process requires patches to be on the
mailing list for a minimum of 24 hours
3. Pull requests can be created at any time to check results
CodeQL has the [ability to ignore certain file
paths](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#avoiding-unnecessary-scans-of-pull-requests)
and I recommend we leverage that to avoid running CodeQL on pull
requests that only change the maintainers.txt file for example.
#### CodeQL Scheduled Builds
While pull requests will only build packages deemed necessary based on
the PR evaluation heuristics used today, the plan is to have a 24
scheduled build that will build all of the packages and place the
results in GitHub Code Scanning.
We will use this process to understand if there any differences
discovered. Based on the results, we might choose to adjust our strategy
for running CodeQL in pull requests to prevent future discrepancies. If
differences are present, it is proposed maintainers of the affected
package resolve those issues.
### CodeQL Locally
The [CodeQL CLI](https://codeql.github.com/docs/codeql-cli/) can be used
as follows to wrap around the edk2 build process (`MdeModulePkg` in this
case) to generate a database in the directory `cpp-database`. Example is
shown using
[stuart](https://github.com/tianocore/edk2-pytool-extensions) build command.
```cmd
codeql database create cpp-database --language=cpp
--command="stuart_ci_build -c .pytool/CISettings.py -p MdeModulePkg -a
IA32,X64 TOOL_CHAIN_TAG=VS2019 Target=DEBUG --clean --skip-post-build"
--overwrite
```
The following command can be used to generate a [SARIF
file](https://codeql.github.com/docs/codeql-cli/sarif-output/) (called
`query-results.sarif`) from that database with the results of the
[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql)
query:
```cmd
codeql database analyze cpp-database
codeql\cpp\ql\src\Security\CWE\CWE-457\ConditionallyUninitializedVariable.ql
--format=sarifv2.1.0 --output=query-results.sarif
```
SARIF logs can be read by log viewers such as the [Sarif
Viewer](https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer)
extension for [VS Code](https://code.visualstudio.com/).
---
Once an edk2 query suite (.qls) file is created, that same file can be
used for pull requests and by developers locally using the CodeQL CLI to
run the same set of queries with the `analyze` command shown above.
Developers can add new queries to the file and confirm results before
attempting server builds.
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#94259): https://edk2.groups.io/g/devel/message/94259
Mute This Topic: https://groups.io/mt/93881031/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-