Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

Zack Weinberg Mon, 01 Apr 2024 10:54:55 -0700

On Sun, Mar 31, 2024, at 3:17 AM, Jacob Bachmeyer wrote:
> Eric Gallager wrote:
>> Specifically, what caught my attention was how the release tarball
>> containing the backdoor didn't match the history of the project in its
>> git repository. That made me think about automake's `distcheck`
>> target, whose entire purpose is to make it easier to verify that a
>> distribution tarball can be rebuilt from itself and contains all the
>> things it ought to contain.
>
> The problem is that a release tarball is a freestanding object, with no 
> dependency on the repository from which it was produced.  In this case, 
> the attacker added a bogus "update" of build-to-host.m4 from gnulib to 
> the release tarball, but that file is not stored in the Git repository.  
> This would not have tripped "make distcheck" because the crocked tarball 
> can indeed be used to rebuild another crocked tarball.
>
> As Alexandre Oliva mentioned in his reply, there is not really any good 
> way to prevent this, since the attacker could also patch the generated 
> configure script more directly.


I have been thinking about this incident and this thread all weekend and
have seen a lot of people saying things like "this is more proof that tarballs
are a thing of the past and everyone should just build straight from git".
There are a bunch of reasons why one might disagree with this as a blanket
statement, but I do think there's a valid point here: the malicious xz
maintainer *might* have been caught earlier if they had committed the
build-to-host.m4 modification to xz's VCS.  (Or they might not have!
Witness the three (and counting) malicious patches that they barefacedly
submitted to *other* software and got accepted because the malice was
subtle enough to pass through code review.)

It might indeed be worth thinking about ways to minimize the difference
between the tarball "make dist" produces and the tarball "git archive"
produces, starting from the same clean git checkout, and also ways to
identify and audit those differences.

...
> Maybe the best revision to the GNU Coding Standards would be that 
> releases should, if at all possible, contain only text?  Any binary 
> files needed for testing can be generated during "make check" if 
> necessary

I don't think this is a good idea.  It's only a speed bump for someone
trying to smuggle malicious data into a package (think "base64 -d") and
it makes life substantially harder for honest authors of programs that
work with binary data, and authors of material whose "source code"
(as GPLv3 uses that term) *is* binary data.  Consider pngsuite, for
instance (http://www.schaik.com/pngsuite/) -- it would be a *ton* of
work to convert each of these test PNG files into GNU Poke scripts,
and probably the result would be *less* ergonomic for purposes of
improving the test suite.

I would like to suggest that a more useful policy would be "files
written to $prefix by 'make install' should not have any data
dependency on files labeled as part of the package's testsuite".
That doesn't constrain honest authors and it seems within the
scope of what the reproducible builds people could test for.
(Build the package, install to nonce prefix 1, unpack the tarball
again, delete the test suite, build again, install to prefix 2, compare.)
Of course a sufficiently determined malicious coder could detect
the reproducible-build test environment, but unlike "no binary data"
this is a substantial difficulty increment.

zw

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

Reply via email to