Re: Validating tarballs against git repositories

Russ Allbery Fri, 29 Mar 2024 23:29:38 -0700

Antonio Russo <antonio.e.ru...@gmail.com> writes:

> The way I see it, there are two options in handling a buildable package:

> 1. That file would have been considered a build artifact, consequently
> removed and then regenerated.  No backdoor.

> 2. The file would not have been scrubbed, and a difference between the
> git version and the released tar version would have been noticed.
> Backdoor found.

> Either of these is, in my mind, dramatically better than what happened.

I think the point that you're assuming (probably because you quite
reasonably think it's too obvious to need to be stated, but I'm not sure
it's that obvious to everyone) is that malicious code injected via a
commit is significantly easier to detect than malicious code that is only
in the release tarball.

This is not *always* correct; it really depends on how many eyes are on
the upstream repository and how complex or unreadable the code upstream
writes normally is.  (For example, I am far from confident that I can
eyeball the difference between valid and malicious procmail-style C code
or random M4 files.)  I think it's clearly at least *sometimes* correct,
though, so I'm sympathetic, particularly given that it's already Debian
practice to regenerate the build system files anyway.

In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to
making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics.  I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm
not sure this is obviously true.

> Ok, so am I understanding you correctly in that you are saying: we do
> actually want *some* build artifacts in the source archives?

> If that's the case, could make those files at packaging time, analogous
> to the DFSG-exclude stripping process?

If I have followed this all correctly, I believe that in this case the
exploit is not in a build artifact.  It's in a very opaque source artifact
that is different in the release tarball from the Git archive.  Assuming
that I have that right, stripping build artifacts wouldn't have done
anything about this exploit, but comparing Git and release tarballs would
have.

I think you're here anticipating a *different* exploit that would be
carried in build artifacts that Debian didn't remove and reconstruct, and
that we want to remove those from our upstream source archives in order to
ensure that we can't accidentally do that.

> On 2024-03-29 22:41, Guillem Jover wrote:

>> (For dpkg at least I'm pondering whether to play with switching to
>> doing something equivalent to «git archive» though, but see above, or
>> maybe generate two tarballs, a plain «git archive» and a portable one.)

Yeah, with my upstream hat on, I'm considering something similar, but I
still believe I have users who want to compile from source on systems
without current autotools, so I still need separate release tarballs.
Having to generate multiple release artifacts (and document them, and
explain to people which ones they want, etc.) is certainly doable, but I
can't say that I'm all that thrilled about it.

I think with my upstream hat on I'd rather ship a clear manifest (checked
into Git) that tells distributions which files in the distribution tarball
are build artifacts, and guarantee that if you delete all of those files,
the remaining tree should be byte-for-byte identical with the
corresponding signed Git tag.  (In other words, Guillem's suggestion.)
Then I can continue to ship only one release artifact.

> I take a look at these every year or so to keep me terrified of C!  If
> it's a single upstream developer, I absolutely agree, but if there's an
> upstream community reviewing the git commits, I really do believe there
> is hope (of them!) identifying bad(tm) things.

A single upstream developer is the most common case, though.  Perhaps less
so for core libraries, but, well, there are plenty of examples.  (To pick
another one that comes readily to mind, zlib appears to only have one
active maintainer.)

The reality that we are struggling with is that the free software
infrastructure on which much of computing runs is massively and painfully
underfunded by society as a whole, and is almost entirely dependent on
random people maintaining things in their free time because they find it
fun, many of whom are close to burnout.  This is, in many ways, the true
root cause of this entire event.

The sad irony here is that the xz maintainer tried to do exactly what we
advise people in this situation to do: try to add a comaintainer to share
the work, and don't block work because you don't have time to personally
vet everything in detail.  This is *exactly* why maintainers often don't
want to do that, and thus force people to fork packages rather than join
in maintaining the existing package.

This is an aside, but this is why my personal policy for my own projects
that I no longer have to maintain is to orphan them and require that
someone fork them, not add additional contributors to my repository or
release infrastructure.  I do not have the resources to vet new
maintainers -- if I had that time to spend on the projects, I wouldn't
have orphaned them -- and therefore I want to explicitly disclaim any
responsibility for what the new maintainer may do.  Someone else will have
to judge whether they are trustworthy.  But I'm not sure that
distributions are in a good position to do that *either*.

> But, I will definitely concede that, had I seen a commit that changed
> that line in the m4, there's a good chance my eyes would have glazed
> over it.

This is why I am somewhat skeptical that forcing everything into Git
commits is as much of a benefit as people are hoping.  This particular
attacker thought it was better to avoid the Git repository, so that is
evidence in support of that approach, and it's certainly more helpful,
once you know something bad has happened, to be able to use all the Git
tools to figure out exactly what happened.  But I'm not sure we're fully
accounting for the fact that tags can be moved, branches can be
force-pushed, and if the Git repository is somewhere other than GitHub,
the malicious possibilities are even broader.

We could narrow those possibilities somewhat by maintaining
Debian-controlled mirrors of upstream Git repositories so that we could
detect rewritten history.  (There are a whole lot of reasons why I think
dgit is a superior model for archive management.  One of them is that it
captures the full Git history of upstream at the point of the upload on
Debian-controlled infrastructure if the maintainer of the package bases it
on upstream's Git tree.)

-- 
Russ Allbery (r...@debian.org)              <https://www.eyrie.org/~eagle/>

Re: Validating tarballs against git repositories

Reply via email to