Hi all, Below is the security review that I did of the tag2upload design.
I am not a neutral party, in the sense that I think tag2upload is a good idea and should be deployed. However, I do these types of security reviews professionally, and I tried to approach this review the same way that I would approach a major work project that needed a security review to ensure we weren't deploying something with security issues. I encourage any Debian community member with security expertise to check my work; with security reviews, the more eyes, the better. I will also post this review on my web site, probably later tonight if I have time.
# Security review of tag2upload architecture Last updated 2024-06-02. ## Introduction tag2upload is an architecture and protocol for uploading a new revision of a Debian package by pushing a signed tag to a repository hosted on Salsa. It has been proposed as an optional alternative to the current (and only) supported mechanism of uploading a signed source package to ftp.upload.debian.org or its SSH equivalent. This is a security review of the proposed tag2upload architecture as of 2024-06-02. It is based on the following documents retrieved as of that date: - https://salsa.debian.org/dgit-team/dgit/-/blob/3704eb1397d27dbd25f19d4c7345ba6e2edf5aa1/TAG2UPLOAD-DESIGN.txt - https://salsa.debian.org/dgit-team/dgit/-/blob/master/tag2upload.5.pod - https://manpages.debian.org/git-debpush ## Summary of conclusions Compared to the existing upload architecture, tag2upload provides additional defenses against injection of malicious code into source packages and better traceability of source package contents, at the cost of some minor additional security risk and infrastructure complexity. I believe tag2upload has somewhat stronger security properties than the current upload mechanism but not a profound advantage. I do not believe it introduces any significant security regressions. The decision on whether to adopt tag2upload should be made primarily on non-security grounds. I have several recommendations for follow-on work should tag2upload be adopted. ## Terminology In this document, the following terms have precise meanings: - _prevent_: Stop an attack before it has been successful in a way that ensures malicious code is never introduced into the archive. - _detect_: Flag that an attacker may have introduced malicious code into the archive, simultaneous with or after the attempt. The malicious code is still introduced and may propagate. - _trace_: After malicious code has been detected in either the upload process or in the archive, provide facilities to trace that malicious code to the keys or account used to introduce it, to specific Git commits or history, or otherwise closer to its origin in time and authentication. Familiarity with two terms from cryptography will also be helpful: - _collision resistance_: A property of a hash function that makes it infeasible to construct two inputs, both under the control of the attacker, that hash to the same hash digest. - _second-preimage resistance_: A property of a hash function that makes it infeasible for an attacker, given a hash digest constructed from input that is not under their control, to construct malicious input that hashes to the same hash digest. A second-preimage attack on a hash function is considerably more difficult than a collision attack. ## Threat model I evaluated both the existing source package upload architecture and the tag2upload architecture against the following threats: - Someone not in the keyring uploads a malicious source package, possibly via a sponsor. - Someone in the keyring (either a Debian Developer or a Debian Maintainer for a package) uploads a malicious source package but makes it appear that the package was uploaded by someone else in the keyring. - An attacker compromises the system a Debian uploader uses to build source packages and uses that access to inject malicious code into a source package. - Someone with administrative access to the archive processing machinery (DAK, the archive signing key, or similar infrastructure) uploads a malicious source package. - Someone with administrative access to the tag2upload server or its signing key uploads a malicious source package. - Someone with administrative access to Salsa uploads a malicious source package. In each case, I looked at prevention, detection, and tracing. Neither the existing upload mechanism nor tag2upload attempt to prevent or detect (as opposed to trace) the upload of a malicious source package by someone in full possession of a key in the keyring, so this threat is not considered in this document, although tracing for this threat is discussed briefly. ## Brief architecture summary ### Existing upload system The existing source package upload mechanism requires that the uploader construct a Debian source package control file (`*.dsc`) for the upload and sign it with an OpenPGP signature from a key on the keyring. This file contains SHA-256 hash digests of each tar or diff file contained in the upload, which together constitute the contents of the source package. The uploader then constructs and signs an upload changes control file (`*.changes`), which contains the SHA-256 hash digests of all files included in the upload, including the source package control file. These signatures are verified and checked against the relevant keyring on a secure system managed by project delegates. If that check passes, the source package is introduced into the archive and included in archive metadata (via a SHA-256 hash digest of the source package control file) signed by the archive key. This triggers the buildds to download and build the source package and upload a corresponding binary package, which is signed by a buildd OpenPGP key and verified in the same manner as the source package upload. Only the OpenPGP signature of the archive metadata is verified by systems running Debian. Any package whose SHA-256 hash digest is included in that signed archive metadata is treated as trusted by a Debian system. The original signature on the source package control file, made by the uploader, is preserved and included in the archive, but it is not separately checked by the buildds or tools such as `apt-get source`. Each of these files contains multiple hash digests, but SHA-256 is the strongest of those hashes. Multiple hash digests may add some theoretical collision resistance, but this analysis assumes that SHA-256 is secure against collision attacks and therefore does not consider the additional hash digests. Weaknesses in SHA-256 would presumably be addressed by adding a new secure hash. ### tag2upload tag2upload replaces the first step of this upload process with the following: 1. The uploader pushes a signed tag in a specific format to Salsa. For non-native packages, this may reference an upstream tree in the same Git repository by commit ID, which will be used to create the `orig` tar file if needed. 2. Salsa notifies a web hook on a secure project-maintained system that a new tag of interest has been pushed. 3. That system (with internal privilege separation) retrieves the Git tag and corresponding commit, verifies the signature and tag metadata, and verifies that the signer is in the relevant keyring. 4. Inside a VM or schroot, that system retrieves the Git tree and upstream source tree if applicable, constructs or retrieves the `orig` tar file, and constructs the Debian source package and source package control file. This VM or schroot in essence operates as a source package buildd. 5. The tag2upload server adds control header fields specifying the Git object ID and the identity string and fingerprint of the uploader, signs the resulting source package control file, constructs an upload changes control file, signs it, and creates and signs another Git tag reflecting any additional Git commits that were required to put the repository into a canonical format (the "dgit view"). 6. The tag2upload server pushes the original Git tag, its referenced tree, and the additional "dgit view" tag to the publicly-accessible dgit-repos Git server as a permanent archive. 7. The tag2upload server uploads the signed source package to the normal archive incoming queue. Subsequent processing of the upload happens identically to the existing upload system. ## Analysis ### Source package construction The existing upload architecture requires trusting the host used by the uploader to build the source package. If that host is compromised, an attacker could inject malicious code into the source package, either by modifying the upstream tar file (if signed upstream tar files are not used) or by injecting it into the Debian package build system, maintainer scripts, or patches. This attack is not equivalent to compromise of the uploader's OpenPGP key, which neither upload architecture defends against. Many Debian uploaders build source packages on less-trusted systems where they also build and test binary packages, and then sign the source package from a more-trusted system or use a hardware key. With tag2upload, the construction of the source package is done by a host that is similar in construction to a buildd, based on the contents of a Git repository on Salsa. This forces the attacker to have access to push commits to Salsa and to risk detection from anyone watching the Salsa repository. Even if they later removed those commits from Salsa, the commits are archived permanently on the dgit-repos Git server. tag2upload is therefore superior in prevention, detection, and tracing against attacks on uploader systems. ### Sponsored package upload Debian relies on the sponsor's review to prevent upload of sponsored packages that contain malicious code. This remains true in both upload architectures. However, the tag2upload protocol requires that all code in the uploaded package be committed to Git, which provides more tools and opportunities for the sponsor to detect introduction of malicious code before it is uploaded to the archive. The prevention benefits of using Git are already available to sponsors if they require sponsored packages to be in Git, so tag2upload does not offer new prevention capabilities for sponsors, but to the extent that sponsors use this upload mechanism it standardizes the availability of those capabilities. Should a malicious package uploaded via tag2upload later be detected, the Git history provides better tracing of the malicious code than sponsored uploads done without Git. tag2upload therefore nudges sponsors towards practices that provide better prevention and tracing. ### tag2upload infrastructure The tag2upload design introduces a new trusted server that can be attacked. This is the flip side of the previous point: moving source package construction off the hosts of each individual uploader requires introducing a new component. If the tag2upload server is compromised, that access could be used to sign and upload malicious source packages that would be accepted by the archive. #### Conceptual analysis This is the classic security trade-off of replacing distributed risk with centralized risk. Neither approach is inherently superior. The security analysis depends on an analysis of the nature of the risk. A rule of thumb in such cases is to prefer distributed risk when defenses are mutually reinforcing (sometimes called the Swiss cheese model). If hardening work done on one system prevents or detects compromises of other systems, the distributed nature of the risk can be a source of resiliency. If, however, each distributed component fails independently and an attacker only has to compromise one of them to achieve their objectives, prefer centralizing that risk so that defensive resources can be concentrated on securing one trusted system. This is often referred to as "put all your eggs in one basket and protect that basket." Source package construction follows the second pattern. In the current upload model, each uploader system has to be protected independently, and work done to protect one system doesn't prevent or detect compromises of a different system until the malicious package is already in the archive. It is difficult to achieve consistent security across all uploader systems due to Debian's highly distributed nature and (intentional) lack of any central management of those systems. Debian can publish best-practice guidelines and provide better tools, but achieving a universal standard of security for uploader source package construction would be difficult. My security recommendation in this case is therefore to centralize the risk as much as possible, moving it off of individual uploader systems with unknown security profiles and onto a central system that can be analyzed and iteratively improved. #### tag2upload server The new tag2upload server architecture introduces a new type of build sandboxing that is similar but not identical to buildds (source package construction requires sufficient network access to Salsa, for example, while buildds can be cut off from the network completely) and new code that has to parse untrusted input. The sandboxing design of the tag2upload server does a good job of reducing that risk. Signatures are checked early, so only attackers able to create a valid OpenPGP signature with a key in the keyring can attack the most security-sensitive part of the system. The signing key is isolated from both the component that processes incoming requests from Salsa and the component that constructs the source package, only interacting with them via a restricted protocol. The best way to detect whether the tag2upload server has been compromised would be to independently verify its output via a reproducible source package construction system that starts from the same inputs, namely a signed Git tag on a Salsa repository. This could be as simple as an independent tag2upload server, or could involve auditing or independent reimplementation of the steps the tag2upload server performs. We don't have reproducible source package builds today, so this is not a regression. We currently blindly trust whatever the uploader uploads, and the tag2upload proposal does not make that risk worse, merely shifts it to central infrastructure. I therefore don't consider reproducible source builds to be a security prerequisite for adoption of the tag2upload proposal. It is, however, obvious follow-on work that would improve detection of some classes of attacks. The tag2upload server adds additional source package control fields that identify the signed Git tag on which the source package is based. To the extent that uploaders use tag2upload, this provides a substantial improvement in source package tracing compared to the current heuristics used to associate a source package in the archive with the Git repository that produced it. #### Source package construction sandbox Compromise of the tag2upload VM or schroot used for source package construction would be roughly equivalent in impact to the compromise of an amd64 buildd: the tag2upload source package construction sandbox can cause malicious packages to be produced for all architectures, but compromising the amd64 architecture alone would be sufficient to cause considerable damage. The sandbox requires limited access to the network, which means it is slightly weaker than the isolation used for a buildd. An attacker may be able to leverage malicious code in a Salsa repository that is being uploaded into interactive access to the sandbox during construction of the source package for that repository. However, the sandbox is reset after each operation, so the attacker will still have significant difficulty using that access to compromise other source package builds. Leveraging that access into a general tag2upload server compromise would require finding an additional security vulnerability in the dgit rpush protocol or in the sandboxing. At present, a compromise of the amd64 buildd is easier to detect due to the reproducible builds project, which may detect injected malicious code as a regression in binary package reproducibility. As discussed above, the tag2upload construction of a source package from a signed Git tag should also be reproducible, so a reproducible source package system would improve detection to the same level or better than Debian's detection abilities for binary buildds. ### Archive processing Someone with administrative access to the archive processing machinery, including the archive signing key, could inject a malicious source (or binary) package into the archive. This is not prevented by either the current upload architecture or by tag2upload. This attack on source packages can be detected by verifying the signatures on all of the source packages in the archive. This remains true after the introduction of tag2upload; some of those source packages will be signed by the tag2upload key instead of a maintainer key, but neither of those keys are available to the archive processing machinery and are therefore equivalent when detecting this specific attack. Compromise of the tag2upload server or its keys is discussed separately above. tag2upload therefore does not change any security properties of archive processing. ### Salsa compromise The tag2upload architecture does not generally rely on the security of Salsa, and therefore compromising Salsa mostly does not make it easier to upload a malicious source package. There are two exceptions: - Administrative access to Salsa would make SHA-1 collision attacks easier, as discussed below. However, this still assumes the attacker is able to create Git trees with colliding hash digests. - Security vulnerabilities in the Git client used by the tag2upload source package construction sandbox could be exploited by a malicious Salsa Git server to compromise the VM and introduce malicious code into the source package it constructs. Since a malicious Git server could similarly be used to compromise the systems of the numerous Debian contributors who use Salsa via Git clients regularly, I don't believe this introduces substantial new risk, but it does create a new avenue of attack that is possibly less likely to be detected. ### Git object collisions The current Git repository format and wire protocols use SHA-1 hash digests (and only SHA-1 hash digests) to identify objects in the Git repository. Git uses a SHA-1 hash function that has been [hardened against the SHAttered attack on SHA-1](https://github.com/cr-marcstevens/sha1collisiondetection), and therefore is probably not vulnerable to known collision attacks. Given the widespread use of Git, there is also a reasonable likelihood that any future attacks similar to SHAttered will receive prompt attention from Git maintainers and the broader open source community. However, SHA-1 is considered cryptographically weak and future collision attacks that will not be defeated by that hardening are possible. Work to add SHA-256 support to Git is [in progress](https://lwn.net/Articles/898522/) but is [not yet supported by the software underlying Salsa](https://gitlab.com/groups/gitlab-org/-/epics/794). I therefore also analyzed the behavior of the tag2upload protocol assuming that an attacker could generate two Git trees that hash to the same SHA-1 hash digest, one benign and one malicious. I made the additional assumption that the manipulation required to make the hashes collide could be hidden in out-of-the-way places (test files, history) that would not be noticed by a reviewer. I do not believe either of these assumptions are true currently or likely to become true in the near future. My understanding is that hash collision attacks on Git repositories of the type required to attack tag2upload are currently entirely theoretical and are likely to remain so for some time to come. This section is therefore a somewhat pedantic exercise in thoroughness, rather than an analysis of an attack I consider likely. This analysis assumes the attacker has full access to the Salsa repository to which the tag2upload tags are pushed, since this will be a common scenario for sponsored uploads. This analysis is relevant only for SHA-1-based Git repositories. Once Salsa supports SHA-256 Git repositories, tag2upload could decline to act on any repository that uses SHA-1 hash digests, making this entire section moot. #### Replaying the tag The attack: Construct a benign and malicious Git repository pair. Present the benign repository to a sponsor for review. Once the sponsor has pushed a signed tag for the benign repository, push the same tag to the malicious Git repository, triggering tag2upload processing of that repository. Since the signed tag that triggers tag2upload processing must specify both the source package and version, the malicious repository must be for the same source package and version. The attacker must therefore win a race against the sponsor so that the malicious package is uploaded first. This attack is noisy and thus vulnerable to detection. The sponsor would receive two tag2upload notifications and an error from whichever upload lost the race. The notification for the malicious repository would also include its Salsa URL, which would not match the Salsa URL that the sponsor was expecting. #### Moving the tag The attack: As above, construct a benign and malicious Git repository pair and get the benign repository signed by a sponsor. Race the tag2upload server by deleting the signed tag from the benign repository, and then push the same tag to the malicious Git repository. This attack removes the doubled notification of the previous attack. However, the race window is narrow, particularly if the attacker wants to avoid an error notification to the sponsor from the tag2upload server after it starts processing and then is unable to pull the signed tag from the benign repository. Such an error would be suspicious and might lead to detection. As with the previous attack, the tag2upload notification to the sponsor would include the Salsa URL of the malicious repository rather than the one that the sponsor signed, which increases the likelihood of detection. #### Replacing the upstream tree The attack: Construct a benign and malicious Git tree pair containing only the upstream source. Reference the benign tree in a source package and get that source package signed by a sponsor to trigger tag2upload processing. Race the tag2upload server by deleting the upstream tag and commit ID and then pushing the malicious Git repository as a new commit with the same commit ID. The upstream tag name is present in the signed tag metadata, but since that tag itself is not required to be signed, the attacker can move it at will. The upstream tag therefore provides no protection against this attack apart from a small detection risk. Authentication of the upstream tree comes only from the inclusion of its commit ID in the tag metadata. I suspect (but am not certain) that this attack would normally be prevented by the Salsa Git service. The benign tree already existed in the same repository with the referenced commit ID (presumed to be checked by the sponsor during review), and even if references to that object are deleted via branch deletion, I believe Git will reject the push of the malicious commit ID until the old objects have been garbage-collected. This presumably will take long enough that the tag2upload process will fail because the upstream commit is missing. This attack could be done by someone with administrative access to Salsa, and thus in a position to force an immediate garbage collection of the unreferenced objects so that the tree underlying the upstream commit ID can be replaced. Administrative access to Salsa would also make it trivial to win the race against the tag2upload server. This attack is less prone to detection than moving the tag to a different Salsa repository. There is a variation on this attack where the attacker deletes the Git tag and tree that it references, pushes a colliding tree, and then repushes the Git tag. I believe this has essentially the same properties as the above attack. ### Second-preimage attacks An attacker could attempt to take the signature from a tag2upload tag pushed to an arbitrary repository on Salsa and apply that same signature to a Git tag for a malicious package on Salsa with the same version number, triggering the tag2upload process. However, this requires constructing a malicious repository with the same SHA-1 hash digest as the repository containing the original tag. This is a second-preimage attack on SHA-1, which is believed to be currently infeasible. (Second-preimage attacks are believed to be currently infeasible even against MD5, which is a much weaker hash function.) tag2upload therefore prevents this attack. This same attack is could be tried against the existing upload mechanism by attempting to reuse the signature of an upload changes control file published in the [debian-devel-changes list archive](https://lists.debian.org/debian-devel-changes/). The second-preimage resistance of the hash function used by the OpenPGP signature similarly prevents this attack. ### Hash weakness in the OpenPGP signature The OpenPGP signature over a source package control file, upload changes control file, or Git tag also relies on a hash function to create the digest that the public key signs. To protect against collision attacks on the OpenPGP signature, both the existing upload system and tag2upload should require that the signature use a strong hash function, such as SHA-256. It is not clear to me what hash functions the current upload architecture permits; this may have already been done there. This restriction is part of the tag2upload design. ## Conclusions The tag2upload architecture has some security advantages over the existing upload architecture: - Source package construction for those who use tag2upload is sandboxed, run on trusted systems, and less prone to variations or compromise from differences in the local build environment on every Debian uploader's personal systems. - Each source package is associated with a Git repository and tag, which are permanently archived on the dgit-repos Git server. This provides more granular tracing information than retaining only the uploaded artifacts, which may or may not be traceable to a VCS repository or tag. - Sponsors are nudged towards reviewing sponsored packages in Git if they wish to use the tag2upload service, which gives them more tools to notice introduction of malicious code and prevent it from being uploaded. It introduces some new security risks, all of which I consider minor: - The tag2upload server introduces another trusted component and some additional attack surface. The additional risk seems manageable, but is not zero. This centralizes risk that previously was distributed across the systems of individual uploaders. - The tag2upload protocol relies on collision resistance of SHA-1 hashes of Git repository objects. Some attacks, particularly against a package sponsorship workflow, are possible if an attacker can construct a benign and malicious Git repository pair with the same SHA-1 hash digest. This attack is at present theoretical and seems likely to remain very challenging for some time to come. I believe widespread adoption of tag2upload would represent a security improvement for Debian. The availability of a more secure source package construction system outweighs, in my opinion, the small additional risks it would introduce. I do not believe it introduces any significant security regressions. Were tag2upload adopted, I would recommend some follow-on work: - Verify that there are securely-archived backups of the dgit-repos Git server, since they contain useful information for tracing any discovered malicious packages. - Set up a reproducible source package construction system that can independently verify the construction of a source package from a signed Git tag, which would provide additional assurance that the tag2upload server has not been compromised. This may be as simple as running an independent tag2upload server on separate infrastructure that would not be compromised by a compromise of Debian project systems. - Adopt SHA-256 support for Git in Salsa as soon as it is available. Restrict tag2upload support to repositories using SHA-256 once the support is mature. The purpose of the tag2upload design is not purely to improve the security of source package uploads. It is intended to enable a new workflow that some Debian contributors may prefer. In most cases in software architecture, the security cost of new features can be reduced but not eliminated entirely. There is some irreducible security risk from introducing a new feature and thus new attack surface. Whether the risk is worth the benefit is not a decision that can be made by a security review.
-- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>