Thank you very much for this review!  You were one of the people I most
wanted to hear from, since I know you have substantial expertise in the
cryptography parts of this and a lot of security experience.  I know you
also had concerns in the past about hash collisions specifically.

Simon Josefsson <si...@josefsson.org> writes:

> Generally I reach the same conclusion, although I think there are real
> security problems with both the existing and the proposed tag2upload
> mechanism that we should all be aware of.

Yes.  tag2upload doesn't fundamentally change the security model of
uploads; most flaws that we currently have we still have with tag2upload.

> It is acceptable to realize that we cannot protect against all attacks
> with reasonable costs.  That's why we need the ability to transparently
> audit all steps, to detect them when they occur.  Reversely: it would be
> unfortunate to say no to new functionality because the new functionality
> don't solve all possible problems.  That just stalls progress.

Wholeheartedly agreed.

>> ## Threat model
>>
>> I evaluated both the existing source package upload architecture and the
>> tag2upload architecture against the following threats:
>>
>> - Someone not in the keyring uploads a malicious source package, possibly
>>   via a sponsor.
>>   
>> - Someone in the keyring (either a Debian Developer or a Debian Maintainer
>>   for a package) uploads a malicious source package but makes it appear
>>   that the package was uploaded by someone else in the keyring.
>>
>> - An attacker compromises the system a Debian uploader uses to build
>>   source packages and uses that access to inject malicious code into a
>>   source package.
>>
>> - Someone with administrative access to the archive processing machinery
>>   (DAK, the archive signing key, or similar infrastructure) uploads a
>>   malicious source package.
>>   
>> - Someone with administrative access to the tag2upload server or its
>>   signing key uploads a malicious source package.
>>   
>> - Someone with administrative access to Salsa uploads a malicious source
>>   package.

> Having a threat model is great.  I find the notion of "uploads a source
> package" is poorly defined here though.

Yes, I used this sloppily.  In some places I mean uploads the package to
dak, in some places I mean pushed a signed Git tag, and in other places I
mean introduces the package into the archive.  I'll try to find some time
to tighten up the wording to be a bit clearer about which specific action
I'm talking about in each case.

> What threat model of those (if any) cover the situation were someone in
> the keyring uploads a (benign) source package and something on Debian's
> side (e.g., design of tag2upload) enables an attacker to substitute some
> part of the intended upload with something malicious?

It's effectively equivalent to:

- Someone with administrative access to the tag2upload server or its
  signing key uploads a malicious source package.

and is discussed at some length in the corresponding section, but I should
call that out as a separate threat.

>> ### Git object collisions
>>
>> The current Git repository format and wire protocols use SHA-1 hash
>> digests (and only SHA-1 hash digests) to identify objects in the Git
>> repository. Git uses a SHA-1 hash function that has been
>> [hardened against the SHAttered attack on 
>> SHA-1](https://github.com/cr-marcstevens/sha1collisiondetection),
>> and therefore is probably not vulnerable to known collision attacks.

> Can this be substantiated?  Using SHA1CD in Git does not necessarily
> mean someone cannot manually create a Git repository with a colliding
> git commit somewhere in the history that gets accepted by git, and
> allows someone to replace actual file contents.  That may be the case,
> but I haven't seen any detailed analysis answering that.

This was a really interesting point that I didn't catch.  Thank you!  Let
me try to rephrase this in the form of an attack and see if this captures
what you were getting at.

The attack: Using a pre-SHA-1DC version of Git, construct a benign and
malicious pair of Git trees that diverge at some point by abusing the hash
of an object vulnerable to SHAttered.  Push the benign tree to Salsa,
relying on Salsa not reverifying the hashes of new objects with a hardened
hash, or alternately have already planted the benign tree in a Git
repository imported into Salsa before SHA-1DC was in use.  Get that tree
signed by a sponsor, again relying on the sponsor's git client not
revalidating object hashes, and then follow the same attack pattern in
either "Moving the tag" or "Replacing the upstream tree."  Rely on the
tag2upload server not reverifying the hashes of the Git tree when it pulls
it to construct the signed source package.

In essence, this attack exploits the fact that Git is lazy about
performing hashes and usually only does so when it has to.  I'm not sure
this assumption is correct for Salsa in particular, but it's at least
plausible.  The trees used in this attack would fail git fsck, because the
critical object would hash to a different value using SHA-1DC than it does
with SHA-1, but it's not clear that git fsck is called at any of the
points that would detect this attack.

I believe this attack would be prevented by setting transfer.fsckObjects
to true in the Git configuration of the tag2upload worker and failing the
operation if it detects anything.  (Or, equivalently, calling git fsck
after git clone and failing on any detected problems.)  I believe this
forces recomputation of the hashes of all received objects.  The object
used in this attack would fail that hash recomputation because the
tag2upload server would use a version of Git that uses SHA-1DC.  The cost
is a performance penalty on git clone, which would be trivial for most
repositories but which might be noticable for particularly large Git
trees.

My personal opinion is that always setting transfer.fsckObjects to true is
good practice anyway to catch more banal problems such as disk corruption
and memory bit flips, so while I'm not sure I would bother just for this
attack, it might be a good idea on general principles.

>> This analysis is relevant only for SHA-1-based Git repositories. Once
>> Salsa supports SHA-256 Git repositories, tag2upload could decline to
>> act on any repository that uses SHA-1 hash digests, making this entire
>> section moot.

> I don't think it will be as simple as that: the git SHA256 transition
> documents suggests to me that even signed tags may refer to both SHA256
> and SHA1 commits:

> https://git-scm.com/docs/hash-function-transition#_signed_tags

> Thus tag2upload would need to require 1) SHA256 Git repository support,
> AND 2) that git tags refer to a SHA256 commit id, AND 3) any git
> submodules used also rely on SHA256 rather than SHA1.

This is what I meant by "decline to act on any repository that uses SHA-1
hash digests."  I didn't get into the details because this isn't something
that we can adopt currently, so working out the specifics seemed
premature.  Once this is an option, we should figure out a transition
plan.

>> #### Replacing the upstream tree
>>
>> The attack: Construct a benign and malicious Git tree pair containing
>> only the upstream source. Reference the benign tree in a source package
>> and get that source package signed by a sponsor to trigger tag2upload
>> processing.  Race the tag2upload server by deleting the upstream tag
>> and commit ID and then pushing the malicious Git repository as a new
>> commit with the same commit ID.

> I think this is an important and realistic attack vector that we
> shouldn't be vulnerable to.

For the record, I am still dubious that this is that important of an
attack vector.  Even using SHAttered to create conflicting Git trees is
quite difficult; the mere presence of a hash collision is not enough to
attack Git in a way that results in a creation of a malicious binary
package.  Git objects have structure and hash collisions require hiding
attacker-generated data in that structure, which is not trivial.  And a
successful attack on Git does not automatically mean that tag2upload is
vulnerable, since using a centralized Git server makes the attack more
difficult.

I'm also dubious that Debian would see a zero-day attack with a novel
SHA-1 collision.  It's more likely that, as with SHAttered, we would get
substantial advance warning of a new SHA-1 collision before an attack on
Git specifically was feasible.  We are far from the only entity that
relies on Git security, and there are significant resources behind the Git
community, so I think the most likely outcome of a new SHA-1 collision
would be something similar to what happened with SHAttered.

I agree that this attack is theoretically possible, and I agree that the
longer we go on using SHA-1 as a basis for Git trees, the more likely the
attack becomes.  But there are substantial obstacles in the way, and I
think we're still some years away from the point where I would lose sleep
worrying about this.

Or, put another way, if this were anywhere close to the easiest way to
attack Debian, we would be doing astonishingly well in our security
posture.

I'm still talking about it here because I do want to be thorough, and I'm
in favor of protecting against unlikely attacks if one can do so easily.
There were a few minor tweaks to the design that I already suggested and
that were accepted which make the detection story for hash collisions
better.  But since you specifically said that the attack was important, I
did want to push back on that a bit.

>> The upstream tag name is present in the signed tag metadata, but since
>> that tag itself is not required to be signed, the attacker can move it
>> at will. The upstream tag therefore provides no protection against this
>> attack apart from a small detection risk. Authentication of the
>> upstream tree comes only from the inclusion of its commit ID in the tag
>> metadata.

> Which is SHA1 currently, and thus vulnerable to a collision attack,
> which are known to be possible.

Correct.

>> This attack could be done by someone with administrative access to
>> Salsa, and thus in a position to force an immediate garbage collection
>> of the unreferenced objects so that the tree underlying the upstream
>> commit ID can be replaced. Administrative access to Salsa would also
>> make it trivial to win the race against the tag2upload server. This
>> attack is less prone to detection than moving the tag to a different
>> Salsa repository.

>> There is a variation on this attack where the attacker deletes the Git
>> tag and tree that it references, pushes a colliding tree, and then
>> repushes the Git tag. I believe this has essentially the same
>> properties as the above attack.

> Is Salsa admin access necessary?  Isn't 'Maintainer' access sufficient?

I dug into this a bit, and I believe GitLab imposes a 30 minute minimum on
unreferenced object retention even if it's manually triggered by a
maintainer.  See:

https://salsa.debian.org/help/administration/housekeeping.md#prune-unreachable-objects

That is going to make it hard to win a race after the tag has been pushed.

It may be more feasible to win a race *before* the tag has been pushed.
In other words, ask the Debian uploader to review your package, wait for
them to clone the package that they're going to review, and then replace
the underlying upstream tree before they push the signed tag.  The timing
is tricky but it may be possible.

A Salsa administrator can run git gc --prune=now directly in the
underlying file system, which makes winning a race much easier.

Sneaking malicious code into the upstream tree without any cryptographic
tricker and just assuming the sponsor will miss it is a more likely
attack.  I'm not sure that an attacker really gains much by doing lots of
novel cryptographic attacks and racing a maintainer just to hide the code
from sponsorship review.  I fear that most sponsorship review is
insufficiently thorough to catch malicious code.  This therefore feels to
me a bit like a highly elaborate attack on a window lock for a house whose
front door is wide open, which is part of why I'm dubious that it's that
important.

> To me, the protection against this attack seems weak, and I don't really
> see any strong protection against it.

The primary protection is that there are fairly good reasons to believe no
one currently knows how to generate colliding Git trees that would make
any of these attacks possible.

> Couldn't we require that the signed tag2upload tag comment contains a
> hash of the upstream code, somehow?

Technically, we could.  This is what git-evtag does.  The drawback is that
it requires a non-standard tag format and an external (and somewhat
non-trivial) tool.

>> ### Second-preimage attacks

>> An attacker could attempt to take the signature from a tag2upload tag
>> pushed to an arbitrary repository on Salsa and apply that same
>> signature to a Git tag for a malicious package on Salsa with the same
>> version number, triggering the tag2upload process. However, this
>> requires constructing a malicious repository with the same SHA-1 hash
>> digest as the repository containing the original tag. This is a
>> second-preimage attack on SHA-1, which is believed to be currently
>> infeasible. (Second-preimage attacks are believed to be currently
>> infeasible even against MD5, which is a much weaker hash function.)
>> tag2upload therefore prevents this attack.

> Is a second preimage really required to mount this attack?  Consider if
> someone creates a collision for a good and a bad version, gets the good
> version signed by a sponsor, and then re-use that signature for the bad
> version.

That's a different attack discussed earlier under "Moving the tag."  This
section is about performing the attack on an arbitrary tag2upload Git tag
for a repository that isn't under the attacker's control.

Arguably I should drop this section, since what it's saying is so obvious
to anyone with knowledge of the cryptography properties in play that
people may read too much into it and assume it's saying something it
isn't.

-- 
Russ Allbery (r...@debian.org)              <https://www.eyrie.org/~eagle/>

Reply via email to