On Wed, 12 Jun 2024 at 08:53:40 -0400, Scott Kitterman wrote:
> On Tuesday, June 11, 2024 6:25:02 PM EDT Sean Whitton wrote:
> > - it improves the traceability and auditability of our source-only
> >   uploads, in ways that are particular salient in the wake of xz-utils.
> 
> As I understand it, Debian was affected by the xz-utils hack, in part, 
> because 
> some artifacts were inserted into an upstream tarball that were not 
> represented in the upstream git.  Please explain how use of tag2upload is 
> relevant to this scenario?  I'm afraid I don't follow.

I think the claim here might be that Debian should stop dealing with
upstream source tarball releases, and instead have the packaging be
branched from upstream git? It isn't explicit in the proposal, and is not
*necessarily* mandatory for tag2upload, but the mentions of generating
".orig" tarballs for consumption by the ftp archive via `git-deborig`
(which is an adjusted git-archive) would seem to imply that the proponents
of tag2upload would like to go in the direction of not redistributing
upstreams' official source-code archives as 1:1 binary blobs.

As a concrete example, for bubblewrap_0.9.0 (a convenient example
of a relatively small package), that would mean that instead
of having our packaged version of bubblewrap be based on the
bubblewrap-0.9.0.tar.xz with sha256 c6347eac... which can be downloaded
from https://github.com/containers/bubblewrap/releases/tag/v0.9.0, our
packaged version of bubblewrap would be based on the tree that forms part
of the tagged commit 8e51677a... in upstream git.

If we did that for xz-utils, then the xz-utils attacker would have
had to include the glue code to activate their malicious payload in
the upstream git history, and not just the official tarball release -
which would hopefully have made it more likely that it would have been
discovered before we integrated the malicious version.

I think that's going to be a harder sell for some packages than for
others. For packages that build with Meson or CMake, the official
upstream source artifact is often just a `git archive` *anyway* (albeit
with submodules replaced by their content, e.g. by `meson dist`); for
example, in bubblewrap[1], `git diff v0.9.0..upstream/0.9.0` is empty,
where upstream/0.9.0 is a `gbp import-orig` of the upstream source
artifact and v0.9.0 is the upstream tag. So there is little difference
between taking the upstream source artifact or making our own
`git archive`. For bubblewrap 0.9.0, the one advantage of the upstream
source artifact is that the upstream release manager (which happens to
have been me) has signed it, with a stronger-than-SHA1 hash.

However, for packages like xz-utils that build with Autotools, the `make
dist` output can include a significant amount of source that is not always
straightforward to obtain any other way (for example modules vendored
from gnulib at a specified version, with no guarantee that a different
gnulib version would be compatible), together with a significant volume
of derived/non-source files that makes a meaningful review of the diff
between the git repo and the official source release difficult to achieve
(you'll see what I mean if you take a look at an older bubblewrap release
`git diff v0.8.0..upstream/0.8.0` [1]), and often, some ambiguous
not-quite-source not-quite-derived content that makes it difficult to say
with confidence what is source and what is not.

This is not unique to Autotools: in the Python packaging team we have
a similar tension between maintainers who say we should always use
upstream git as our basis for source packages, and maintainers who say
we should always use the "sdist" tarball that upstream released to PyPI
(usually not identical).

Of course, the xz-utils attacker was counting on it being difficult to
do a meaningful review of the diff between the git repo and the official
tarball release that they produced, and that diff is exactly where they
hid the glue code to activate their malicious payload.  So I think it's
valid to hope that key upstreams will move towards producing releases
that are as transparent and "nothing up my sleeve" as possible.

(However, many of the most key upstreams are overworked, presented
with incompatible demands such as modernizing their codebases but also
minimizing change and remaining compatible with obsolete platforms, and
in no position to change how do their releases quickly; so it would be
easy to end up in the paradoxical situation where small/irrelevant/"toy"
projects have an easy audit trail, but the projects that we depend on
for our security remain difficult to audit.)

Our colleagues in other distributions often have workflows that have a
git-based code path and a tarball-based code path, usually preferring the
former: for instance Arch Linux PKGBUILDs usually start from a shallow clone
of a specified commit if possible, only falling back to tarballs if no
suitable git repo is available.

devref currently demands that we use official upstream source release
artifacts without repacking unless there are good reasons why we have to
(https://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.en.html#repackaged-upstream-source).
I don't know how much of that is genuinely still best-practice in 2024 and
how much is institutional inertia, but right now, it's what the project
is claiming to be best-practice.

As far as I know, git-archive (and therefore git-deborig) doesn't guarantee
that repeatedly archiving the same git tree produces the same tarball,
which could be awkward for the ftp archive's tarball-integrity-based rules;
but hopefully tag2upload would insulate individual developers from that by
always "doing the right thing" for the current contents of the archive?

    smcv

[1] To follow along with these example commands, you'll need:
    gbp clone vcsgit:bubblewrap
    cd bubblewrap
    git remote add github https://github.com/containers/bubblewrap
    git remote update

Reply via email to