Bruno Haible wrote:
Richard Stallman commented on Jacob Bachmeyer's idea:
> > Another related check that /would/ have caught this attempt would be
> > comparing the aclocal m4 files in a release against their (meta)upstream
> > sources before building a package. This is something distribution
> > maintainers could do without cooperation from upstream. If
> > m4/build-to-host.m4 had been recognized as coming from gnulib and
> > compared to the copy in gnulib, the nonempty diff would have been
> > suspicious.
I have a hunch that some effort is needed to do that comparison, but
that it is feasible to write a script to do it could make it easy.
Is that so?
Yes, the technical side of such a comparison is relatively easy to
implement:
- There are less than about 2000 or 5000 *.m4 files that are shared
between projects. Downloading and storing all historical versions
of these files will take ca. 0.1 to 1 GB.
- They would be stored in a content-based index, i.e. indexed by
sha256 hash code.
- A distribution could then quickly test whether a *.m4 file found
in a distrib tarball is "known".
The recurrently time-consuming part is, whenever an "unknown" *.m4 file
appears, to
- manually review it,
- update the list of upstream git repositories (e.g. when a project
has been forked) or the list of releases to consider (e.g. snapshots
of GNU Autoconf or GNU libtool, or distribution-specific modifications).
I agree with Jacob that a distro can put this in place, without needing
to bother upstream developers.
I have since thought of a simple solution that /would/ have caught this
backdoor campaign in its tracks: an "autopoint --check" command that
simply compares the m4/ files (and possibly others?) that autopoint
would copy in if m4/ were empty against the files that would be copied
and reports any differences. A newer serial in the package tree than
the system m4 library produces a minor complaint; a file with the same
serial and different contents produces a major complaint. An older
serial in the package tree should be reported, but is likely to be of no
consequence if a distribution's packaging routine will copy in the
known-good newer version before rebuilding configure. Any m4/ files
local to the package are simply reported, but those are also in the
package's Git repository.
Distribution package maintainers would run "autopoint --check" and pass
any suspicious files to upstream maintainers for evaluation. (The
distribution's own packaging system can trace an m4 file in the system
library came to its upstream package.) The modified build-to-host.m4
would have been very /unlikely/ to slip past the
gnulib/gettext/Automake/Autoconf maintainers, although few distribution
packagers would have had suspicions. The gnulib maintainers would know
that gl_BUIILD_TO_HOST should not be checking /anything/ itself and the
crackers would have been caught.
This should be effective in closing off a large swath of possible
attacks: a backdoor concealed in binary test data (or documentation)
requires some visible means to unpack it, which means the unpacker must
appear in source somewhere. While the average package maintainer might
not be able to make sense of a novel m4 file, the maintainers of GNU's
version of that file /will/ be able to recognize such chicanery, and the
"red herrings" the cracker added for obfuscation would become a
liability. Without them, the effect of the new code is more obvious, so
the crackers lose either way.
-- Jacob