Security review of tag2upload

Russ Allbery Tue, 11 Jun 2024 18:39:40 -0700

Hi all,

Below is the security review that I did of the tag2upload design.


I am not a neutral party, in the sense that I think tag2upload is a good
idea and should be deployed.  However, I do these types of security
reviews professionally, and I tried to approach this review the same way
that I would approach a major work project that needed a security review
to ensure we weren't deploying something with security issues.  I
encourage any Debian community member with security expertise to check my
work; with security reviews, the more eyes, the better.

I will also post this review on my web site, probably later tonight if I
have time.

# Security review of tag2upload architecture

Last updated 2024-06-02.

## Introduction

tag2upload is an architecture and protocol for uploading a new revision of
a Debian package by pushing a signed tag to a repository hosted on
Salsa. It has been proposed as an optional alternative to the current (and
only) supported mechanism of uploading a signed source package to
ftp.upload.debian.org or its SSH equivalent.

This is a security review of the proposed tag2upload architecture as of
2024-06-02. It is based on the following documents retrieved as of that
date:

- https://salsa.debian.org/dgit-team/dgit/-/blob/3704eb1397d27dbd25f19d4c7345ba6e2edf5aa1/TAG2UPLOAD-DESIGN.txt
- https://salsa.debian.org/dgit-team/dgit/-/blob/master/tag2upload.5.pod
- https://manpages.debian.org/git-debpush

## Summary of conclusions

Compared to the existing upload architecture, tag2upload provides
additional defenses against injection of malicious code into source
packages and better traceability of source package contents, at the cost
of some minor additional security risk and infrastructure complexity. I
believe tag2upload has somewhat stronger security properties than the
current upload mechanism but not a profound advantage. I do not believe
it introduces any significant security regressions.

The decision on whether to adopt tag2upload should be made primarily on
non-security grounds.

I have several recommendations for follow-on work should tag2upload be
adopted.

## Terminology

In this document, the following terms have precise meanings:

- _prevent_: Stop an attack before it has been successful in a way that
  ensures malicious code is never introduced into the archive.
  
- _detect_: Flag that an attacker may have introduced malicious code into
  the archive, simultaneous with or after the attempt. The malicious code
  is still introduced and may propagate.
  
- _trace_: After malicious code has been detected in either the upload
  process or in the archive, provide facilities to trace that malicious
  code to the keys or account used to introduce it, to specific Git
  commits or history, or otherwise closer to its origin in time and
  authentication.

Familiarity with two terms from cryptography will also be helpful:

- _collision resistance_: A property of a hash function that makes it
  infeasible to construct two inputs, both under the control of the
  attacker, that hash to the same hash digest.

- _second-preimage resistance_: A property of a hash function that makes
  it infeasible for an attacker, given a hash digest constructed from
  input that is not under their control, to construct malicious input that
  hashes to the same hash digest. A second-preimage attack on a hash
  function is considerably more difficult than a collision attack.

## Threat model

I evaluated both the existing source package upload architecture and the
tag2upload architecture against the following threats:

- Someone not in the keyring uploads a malicious source package, possibly
  via a sponsor.
  
- Someone in the keyring (either a Debian Developer or a Debian Maintainer
  for a package) uploads a malicious source package but makes it appear
  that the package was uploaded by someone else in the keyring.

- An attacker compromises the system a Debian uploader uses to build
  source packages and uses that access to inject malicious code into a
  source package.

- Someone with administrative access to the archive processing machinery
  (DAK, the archive signing key, or similar infrastructure) uploads a
  malicious source package.
  
- Someone with administrative access to the tag2upload server or its
  signing key uploads a malicious source package.
  
- Someone with administrative access to Salsa uploads a malicious source
  package.

In each case, I looked at prevention, detection, and tracing.

Neither the existing upload mechanism nor tag2upload attempt to prevent or
detect (as opposed to trace) the upload of a malicious source package by
someone in full possession of a key in the keyring, so this threat is not
considered in this document, although tracing for this threat is
discussed briefly.

## Brief architecture summary

### Existing upload system

The existing source package upload mechanism requires that the uploader
construct a Debian source package control file (`*.dsc`) for the upload
and sign it with an OpenPGP signature from a key on the keyring. This file
contains SHA-256 hash digests of each tar or diff file contained in the
upload, which together constitute the contents of the source package. The
uploader then constructs and signs an upload changes control file
(`*.changes`), which contains the SHA-256 hash digests of all files
included in the upload, including the source package control file.

These signatures are verified and checked against the relevant keyring on
a secure system managed by project delegates. If that check passes, the
source package is introduced into the archive and included in archive
metadata (via a SHA-256 hash digest of the source package control file)
signed by the archive key. This triggers the buildds to download and build
the source package and upload a corresponding binary package, which is
signed by a buildd OpenPGP key and verified in the same manner as the
source package upload.

Only the OpenPGP signature of the archive metadata is verified by systems
running Debian. Any package whose SHA-256 hash digest is included in that
signed archive metadata is treated as trusted by a Debian system. The
original signature on the source package control file, made by the
uploader, is preserved and included in the archive, but it is not
separately checked by the buildds or tools such as `apt-get source`.

Each of these files contains multiple hash digests, but SHA-256 is the
strongest of those hashes. Multiple hash digests may add some theoretical
collision resistance, but this analysis assumes that SHA-256 is secure
against collision attacks and therefore does not consider the additional
hash digests. Weaknesses in SHA-256 would presumably be addressed by
adding a new secure hash.

### tag2upload

tag2upload replaces the first step of this upload process with the
following:

1. The uploader pushes a signed tag in a specific format to Salsa. For
   non-native packages, this may reference an upstream tree in the same
   Git repository by commit ID, which will be used to create the `orig`
   tar file if needed.

2. Salsa notifies a web hook on a secure project-maintained system that a
   new tag of interest has been pushed.

3. That system (with internal privilege separation) retrieves the Git tag
   and corresponding commit, verifies the signature and tag metadata, and
   verifies that the signer is in the relevant keyring.

4. Inside a VM or schroot, that system retrieves the Git tree and upstream
   source tree if applicable, constructs or retrieves the `orig` tar file,
   and constructs the Debian source package and source package control
   file.  This VM or schroot in essence operates as a source package
   buildd.

5. The tag2upload server adds control header fields specifying the Git
   object ID and the identity string and fingerprint of the uploader,
   signs the resulting source package control file, constructs an upload
   changes control file, signs it, and creates and signs another Git tag
   reflecting any additional Git commits that were required to put the
   repository into a canonical format (the "dgit view").

6. The tag2upload server pushes the original Git tag, its referenced tree,
   and the additional "dgit view" tag to the publicly-accessible
   dgit-repos Git server as a permanent archive.

7. The tag2upload server uploads the signed source package to the normal
   archive incoming queue.

Subsequent processing of the upload happens identically to the existing
upload system.

## Analysis

### Source package construction

The existing upload architecture requires trusting the host used by the
uploader to build the source package. If that host is compromised, an
attacker could inject malicious code into the source package, either by
modifying the upstream tar file (if signed upstream tar files are not
used) or by injecting it into the Debian package build system, maintainer
scripts, or patches.

This attack is not equivalent to compromise of the uploader's OpenPGP key,
which neither upload architecture defends against. Many Debian uploaders
build source packages on less-trusted systems where they also build and
test binary packages, and then sign the source package from a more-trusted
system or use a hardware key.

With tag2upload, the construction of the source package is done by a host
that is similar in construction to a buildd, based on the contents of a
Git repository on Salsa. This forces the attacker to have access to push
commits to Salsa and to risk detection from anyone watching the Salsa
repository. Even if they later removed those commits from Salsa, the
commits are archived permanently on the dgit-repos Git server.

tag2upload is therefore superior in prevention, detection, and tracing
against attacks on uploader systems.

### Sponsored package upload

Debian relies on the sponsor's review to prevent upload of sponsored
packages that contain malicious code. This remains true in both upload
architectures. However, the tag2upload protocol requires that all code in
the uploaded package be committed to Git, which provides more tools and
opportunities for the sponsor to detect introduction of malicious code
before it is uploaded to the archive. The prevention benefits of using Git
are already available to sponsors if they require sponsored packages to be
in Git, so tag2upload does not offer new prevention capabilities for
sponsors, but to the extent that sponsors use this upload mechanism it
standardizes the availability of those capabilities.

Should a malicious package uploaded via tag2upload later be detected, the
Git history provides better tracing of the malicious code than sponsored
uploads done without Git.

tag2upload therefore nudges sponsors towards practices that provide better
prevention and tracing.

### tag2upload infrastructure

The tag2upload design introduces a new trusted server that can be
attacked. This is the flip side of the previous point: moving source
package construction off the hosts of each individual uploader requires
introducing a new component. If the tag2upload server is compromised, that
access could be used to sign and upload malicious source packages that
would be accepted by the archive.

#### Conceptual analysis

This is the classic security trade-off of replacing distributed risk with
centralized risk. Neither approach is inherently superior. The security
analysis depends on an analysis of the nature of the risk.

A rule of thumb in such cases is to prefer distributed risk when defenses
are mutually reinforcing (sometimes called the Swiss cheese model). If
hardening work done on one system prevents or detects compromises of other
systems, the distributed nature of the risk can be a source of resiliency.
If, however, each distributed component fails independently and an
attacker only has to compromise one of them to achieve their objectives,
prefer centralizing that risk so that defensive resources can be
concentrated on securing one trusted system. This is often referred to as
"put all your eggs in one basket and protect that basket."

Source package construction follows the second pattern. In the current
upload model, each uploader system has to be protected independently, and
work done to protect one system doesn't prevent or detect compromises of a
different system until the malicious package is already in the archive. It
is difficult to achieve consistent security across all uploader systems
due to Debian's highly distributed nature and (intentional) lack of any
central management of those systems. Debian can publish best-practice
guidelines and provide better tools, but achieving a universal standard of
security for uploader source package construction would be difficult.

My security recommendation in this case is therefore to centralize the
risk as much as possible, moving it off of individual uploader systems
with unknown security profiles and onto a central system that can be
analyzed and iteratively improved.

#### tag2upload server

The new tag2upload server architecture introduces a new type of build
sandboxing that is similar but not identical to buildds (source package
construction requires sufficient network access to Salsa, for example,
while buildds can be cut off from the network completely) and new code
that has to parse untrusted input.

The sandboxing design of the tag2upload server does a good job of reducing
that risk. Signatures are checked early, so only attackers able to create
a valid OpenPGP signature with a key in the keyring can attack the most
security-sensitive part of the system. The signing key is isolated from
both the component that processes incoming requests from Salsa and the
component that constructs the source package, only interacting with them
via a restricted protocol.

The best way to detect whether the tag2upload server has been compromised
would be to independently verify its output via a reproducible source
package construction system that starts from the same inputs, namely a
signed Git tag on a Salsa repository. This could be as simple as an
independent tag2upload server, or could involve auditing or independent
reimplementation of the steps the tag2upload server performs.

We don't have reproducible source package builds today, so this is not a
regression. We currently blindly trust whatever the uploader uploads, and
the tag2upload proposal does not make that risk worse, merely shifts it to
central infrastructure. I therefore don't consider reproducible source
builds to be a security prerequisite for adoption of the tag2upload
proposal. It is, however, obvious follow-on work that would improve
detection of some classes of attacks.

The tag2upload server adds additional source package control fields that
identify the signed Git tag on which the source package is based. To the
extent that uploaders use tag2upload, this provides a substantial
improvement in source package tracing compared to the current heuristics
used to associate a source package in the archive with the Git repository
that produced it.

#### Source package construction sandbox

Compromise of the tag2upload VM or schroot used for source package
construction would be roughly equivalent in impact to the compromise of an
amd64 buildd: the tag2upload source package construction sandbox can cause
malicious packages to be produced for all architectures, but compromising
the amd64 architecture alone would be sufficient to cause considerable
damage.

The sandbox requires limited access to the network, which means it is
slightly weaker than the isolation used for a buildd. An attacker may be
able to leverage malicious code in a Salsa repository that is being
uploaded into interactive access to the sandbox during construction of the
source package for that repository. However, the sandbox is reset after
each operation, so the attacker will still have significant difficulty
using that access to compromise other source package builds. Leveraging
that access into a general tag2upload server compromise would require
finding an additional security vulnerability in the dgit rpush protocol or
in the sandboxing.

At present, a compromise of the amd64 buildd is easier to detect due to
the reproducible builds project, which may detect injected malicious code
as a regression in binary package reproducibility. As discussed above, the
tag2upload construction of a source package from a signed Git tag should
also be reproducible, so a reproducible source package system would
improve detection to the same level or better than Debian's detection
abilities for binary buildds.

### Archive processing

Someone with administrative access to the archive processing machinery,
including the archive signing key, could inject a malicious source (or
binary) package into the archive. This is not prevented by either the
current upload architecture or by tag2upload.

This attack on source packages can be detected by verifying the signatures
on all of the source packages in the archive. This remains true after the
introduction of tag2upload; some of those source packages will be signed
by the tag2upload key instead of a maintainer key, but neither of those
keys are available to the archive processing machinery and are therefore
equivalent when detecting this specific attack. Compromise of the
tag2upload server or its keys is discussed separately above.

tag2upload therefore does not change any security properties of archive
processing.

### Salsa compromise

The tag2upload architecture does not generally rely on the security of
Salsa, and therefore compromising Salsa mostly does not make it easier
to upload a malicious source package. There are two exceptions:

- Administrative access to Salsa would make SHA-1 collision attacks
  easier, as discussed below. However, this still assumes the attacker is
  able to create Git trees with colliding hash digests.

- Security vulnerabilities in the Git client used by the tag2upload source
  package construction sandbox could be exploited by a malicious Salsa Git
  server to compromise the VM and introduce malicious code into the source
  package it constructs. Since a malicious Git server could similarly be
  used to compromise the systems of the numerous Debian contributors who
  use Salsa via Git clients regularly, I don't believe this introduces
  substantial new risk, but it does create a new avenue of attack that is
  possibly less likely to be detected.

### Git object collisions

The current Git repository format and wire protocols use SHA-1 hash
digests (and only SHA-1 hash digests) to identify objects in the Git
repository. Git uses a SHA-1 hash function that has been
[hardened against the SHAttered attack on SHA-1](https://github.com/cr-marcstevens/sha1collisiondetection),
and therefore is probably not vulnerable to known collision attacks. Given
the widespread use of Git, there is also a reasonable likelihood that any
future attacks similar to SHAttered will receive prompt attention from Git
maintainers and the broader open source community. However, SHA-1 is
considered cryptographically weak and future collision attacks that will
not be defeated by that hardening are possible.

Work to add SHA-256 support to Git is
[in progress](https://lwn.net/Articles/898522/) but is
[not yet supported by the software underlying Salsa](https://gitlab.com/groups/gitlab-org/-/epics/794).
I therefore also analyzed the behavior of the tag2upload protocol assuming
that an attacker could generate two Git trees that hash to the same SHA-1
hash digest, one benign and one malicious. I made the additional
assumption that the manipulation required to make the hashes collide could
be hidden in out-of-the-way places (test files, history) that would not be
noticed by a reviewer.

I do not believe either of these assumptions are true currently or likely
to become true in the near future. My understanding is that hash collision
attacks on Git repositories of the type required to attack tag2upload are
currently entirely theoretical and are likely to remain so for some time
to come. This section is therefore a somewhat pedantic exercise in
thoroughness, rather than an analysis of an attack I consider likely.

This analysis assumes the attacker has full access to the Salsa
repository to which the tag2upload tags are pushed, since this will be a
common scenario for sponsored uploads.

This analysis is relevant only for SHA-1-based Git repositories. Once
Salsa supports SHA-256 Git repositories, tag2upload could decline to act
on any repository that uses SHA-1 hash digests, making this entire section
moot.

#### Replaying the tag

The attack: Construct a benign and malicious Git repository pair. Present
the benign repository to a sponsor for review. Once the sponsor has pushed
a signed tag for the benign repository, push the same tag to the malicious
Git repository, triggering tag2upload processing of that repository.

Since the signed tag that triggers tag2upload processing must specify both
the source package and version, the malicious repository must be for the
same source package and version. The attacker must therefore win a race
against the sponsor so that the malicious package is uploaded first.

This attack is noisy and thus vulnerable to detection. The sponsor would
receive two tag2upload notifications and an error from whichever upload
lost the race. The notification for the malicious repository would also
include its Salsa URL, which would not match the Salsa URL that the
sponsor was expecting.

#### Moving the tag

The attack: As above, construct a benign and malicious Git repository pair
and get the benign repository signed by a sponsor. Race the tag2upload
server by deleting the signed tag from the benign repository, and then
push the same tag to the malicious Git repository.

This attack removes the doubled notification of the previous attack.
However, the race window is narrow, particularly if the attacker wants to
avoid an error notification to the sponsor from the tag2upload server
after it starts processing and then is unable to pull the signed tag from
the benign repository. Such an error would be suspicious and might lead to
detection.

As with the previous attack, the tag2upload notification to the sponsor
would include the Salsa URL of the malicious repository rather than the
one that the sponsor signed, which increases the likelihood of detection.

#### Replacing the upstream tree

The attack: Construct a benign and malicious Git tree pair containing only
the upstream source. Reference the benign tree in a source package and get
that source package signed by a sponsor to trigger tag2upload processing.
Race the tag2upload server by deleting the upstream tag and commit ID and
then pushing the malicious Git repository as a new commit with the same
commit ID.

The upstream tag name is present in the signed tag metadata, but since
that tag itself is not required to be signed, the attacker can move it at
will. The upstream tag therefore provides no protection against this
attack apart from a small detection risk. Authentication of the upstream
tree comes only from the inclusion of its commit ID in the tag metadata.

I suspect (but am not certain) that this attack would normally be
prevented by the Salsa Git service. The benign tree already existed in the
same repository with the referenced commit ID (presumed to be checked by
the sponsor during review), and even if references to that object are
deleted via branch deletion, I believe Git will reject the push of the
malicious commit ID until the old objects have been garbage-collected.
This presumably will take long enough that the tag2upload process will
fail because the upstream commit is missing.

This attack could be done by someone with administrative access to Salsa,
and thus in a position to force an immediate garbage collection of the
unreferenced objects so that the tree underlying the upstream commit ID
can be replaced. Administrative access to Salsa would also make it trivial
to win the race against the tag2upload server. This attack is less prone
to detection than moving the tag to a different Salsa repository.

There is a variation on this attack where the attacker deletes the Git tag
and tree that it references, pushes a colliding tree, and then repushes
the Git tag. I believe this has essentially the same properties as the
above attack.

### Second-preimage attacks

An attacker could attempt to take the signature from a tag2upload tag
pushed to an arbitrary repository on Salsa and apply that same signature
to a Git tag for a malicious package on Salsa with the same version
number, triggering the tag2upload process. However, this requires
constructing a malicious repository with the same SHA-1 hash digest as the
repository containing the original tag. This is a second-preimage attack
on SHA-1, which is believed to be currently infeasible. (Second-preimage
attacks are believed to be currently infeasible even against MD5, which is
a much weaker hash function.) tag2upload therefore prevents this attack.

This same attack is could be tried against the existing upload mechanism
by attempting to reuse the signature of an upload changes control file
published in the
[debian-devel-changes list archive](https://lists.debian.org/debian-devel-changes/).
The second-preimage resistance of the hash function used by the OpenPGP
signature similarly prevents this attack.

### Hash weakness in the OpenPGP signature

The OpenPGP signature over a source package control file, upload changes
control file, or Git tag also relies on a hash function to create the
digest that the public key signs.

To protect against collision attacks on the OpenPGP signature, both the
existing upload system and tag2upload should require that the signature
use a strong hash function, such as SHA-256. It is not clear to me what
hash functions the current upload architecture permits; this may have
already been done there. This restriction is part of the tag2upload
design.

## Conclusions

The tag2upload architecture has some security advantages over the existing
upload architecture:

- Source package construction for those who use tag2upload is sandboxed,
  run on trusted systems, and less prone to variations or compromise from
  differences in the local build environment on every Debian uploader's
  personal systems.

- Each source package is associated with a Git repository and tag, which
  are permanently archived on the dgit-repos Git server. This provides
  more granular tracing information than retaining only the uploaded
  artifacts, which may or may not be traceable to a VCS repository or tag.

- Sponsors are nudged towards reviewing sponsored packages in Git if they
  wish to use the tag2upload service, which gives them more tools to
  notice introduction of malicious code and prevent it from being
  uploaded.

It introduces some new security risks, all of which I consider minor:

- The tag2upload server introduces another trusted component and some
  additional attack surface. The additional risk seems manageable, but is
  not zero. This centralizes risk that previously was distributed across
  the systems of individual uploaders.

- The tag2upload protocol relies on collision resistance of SHA-1 hashes
  of Git repository objects. Some attacks, particularly against a package
  sponsorship workflow, are possible if an attacker can construct a benign
  and malicious Git repository pair with the same SHA-1 hash digest. This
  attack is at present theoretical and seems likely to remain very
  challenging for some time to come.

I believe widespread adoption of tag2upload would represent a security
improvement for Debian. The availability of a more secure source package
construction system outweighs, in my opinion, the small additional risks
it would introduce. I do not believe it introduces any significant
security regressions.

Were tag2upload adopted, I would recommend some follow-on work:

- Verify that there are securely-archived backups of the dgit-repos Git
  server, since they contain useful information for tracing any discovered
  malicious packages.

- Set up a reproducible source package construction system that can
  independently verify the construction of a source package from a signed
  Git tag, which would provide additional assurance that the tag2upload
  server has not been compromised. This may be as simple as running an
  independent tag2upload server on separate infrastructure that would not
  be compromised by a compromise of Debian project systems.

- Adopt SHA-256 support for Git in Salsa as soon as it is available.
  Restrict tag2upload support to repositories using SHA-256 once the
  support is mature.

The purpose of the tag2upload design is not purely to improve the security
of source package uploads. It is intended to enable a new workflow that
some Debian contributors may prefer. In most cases in software
architecture, the security cost of new features can be reduced but not
eliminated entirely. There is some irreducible security risk from
introducing a new feature and thus new attack surface. Whether the risk is
worth the benefit is not a decision that can be made by a security review.

-- 
Russ Allbery (r...@debian.org)              <https://www.eyrie.org/~eagle/>

Security review of tag2upload

Reply via email to