Richard Stallman commented on Jacob Bachmeyer's idea:
> > > Another related check that /would/ have caught this attempt would be
> > > comparing the aclocal m4 files in a release against their
> (meta)upstream
> > > sources before building a package. This is something distribution
> > > maintainers could do without cooperation from upstream. If
> > > m4/build-to-host.m4 had been recognized as coming from gnulib and
> > > compared to the copy in gnulib, the nonempty diff would have been
> > > suspicious.
>
> I have a hunch that some effort is needed to do that comparison, but
> that it is feasible to write a script to do it could make it easy.
> Is that so?
Yes, the technical side of such a comparison is relatively easy to
implement:
- There are less than about 2000 or 5000 *.m4 files that are shared
between projects. Downloading and storing all historical versions
of these files will take ca. 0.1 to 1 GB.
- They would be stored in a content-based index, i.e. indexed by
sha256 hash code.
- A distribution could then quickly test whether a *.m4 file found
in a distrib tarball is "known".
The recurrently time-consuming part is, whenever an "unknown" *.m4 file
appears, to
- manually review it,
- update the list of upstream git repositories (e.g. when a project
has been forked) or the list of releases to consider (e.g. snapshots
of GNU Autoconf or GNU libtool, or distribution-specific modifications).
I agree with Jacob that a distro can put this in place, without needing
to bother upstream developers.
Bruno