[email protected] wrote:
On 2024-03-30 18:25, Bruno Haible wrote:
Eric Gallager wrote:
Hm, so should automake's `distcheck` target be updated to perform
these checks as well, then?
The first mentioned check can not be automated. ...
The second mentioned check could be done by the maintainer, ...
I agree that distcheck is good but not a cure all. Any static system
can be attacked when there is motive, and unit tests are easily gamed.
The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data. In this case, that
binary data was used to smuggle in heavily obfuscated object code.
The best analysis in one place that I have found so far is
<URL:https://gynvael.coldwind.pl/?lang=en&id=782>. In brief, grep is
used to locate the main backdoor files by searching for marker strings.
After running tests/files/bad-3-corrupt_lzma2.xz through tr(1), it
becomes a /valid/ xz file that decompresses to a shell script that
extracts a second shell script from part of the compressed data in
tests/files/good-large_compressed.lzma and pipes it to a shell. That
second script has two major functions: first, it searches the test
files for four six-byte markers, and it then extracts and decrypts
(using a simple RC4-alike implemented in Awk) the binary backdoor also
found in tests/files/good-large_compressed.lzma. The six-byte markers
mark beginning and end of raw LZMA2 streams obfuscated with a simple
substitution cipher. Any such streams found would be decompressed and
read by the shell, but neither of the known crocked releases had any
files containing those markers. The binary backdoor is an x86-64 object
that gets unpacked into liblzma_la-crc64-fast.o, unless m4/gettext.m4
contains "dnl Convert it to C string syntax." which is a clever flag
because about no one actually checks that those m4 files in release
tarballs actually match what the GNU project distributes. The object
itself is just the backdoor and presumably provides the symbol
_get_cpuid as its entrypoint, since the unpacker script patches the
src/liblzma/check/crc{64,32}_fast.c files in a pipeline to add calls to
that function and drops the compiled objects in .libs/. Running make
will then skip building those objects, since they are already
up-to-date, and the backdoored objects get linked into the final binary.
Commit 6e636819e8f070330d835fce46289a3ff72a7b89
(<URL:https://git.tukaani.org/?p=xz.git;a=commitdiff;h=6e636819e8f070330d835fce46289a3ff72a7b89>)
was an update to the backdoor. The commit message is suspicious,
claiming the use of "a constant seed" to generate reproducible test
files, but /not/ declaring how the files were produced, which of course
prevents reproducibility.
With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments. Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.
This can only work if a package /has/ multiple active maintainers.
You also have a small misunderstanding here: "make dist" prepares a
(source) release tarball, not a binary build, so this is a
closely-related issue but actually distinct from reproducible builds.
Also easier to solve, since we only have to make the source tarball
reproducible.
Maybe GNU should establish a cross-verification signing standard and
"dist verification service" that automates this process? Point it to
a repo and tag, request a signed hash of the dist package... Then
downstream projects could check package signatures from both the
maintainer and such third-party verifiers to check that nothing was
inserted outside of version control.
Essentially, this would be an automated release building service: upon
request, make a Git checkout, run autogen.sh or equivalent, make dist,
and publish or hash the result. The problem is that an attacker who
manages to gain commit access to a repository may be able to launch
attacks on the release building service, since "make dist" can run
scripts. The service could probably mount the working filesystem noexec
since preparing source releases should not require running (non-system)
binaries and scripts can be run by directly feeding them into their
interpreters even if the filesystem is mounted noexec, but this still
leaves all available interpreters and system tools potentially available.
-- Jacob