Re: Upstreams with "official" tarballs differing from their git

Jeremy Stanley Sat, 15 Feb 2025 06:22:39 -0800

On 2025-02-15 12:33:16 +0100 (+0100), Daniel Gröber wrote:
[...]
> FYI: If all upstream wants is git metadata I like to introduce them to the
> wonderful, but obscure, git `export-subst` feature. See git-attributes(1).
> 
> Works with forges, git-archive and everything.
> 
> Example:
> https://github.com/YosysHQ/yosys/commit/222e7a2da345f01980d9261c40c5d50eced4f9ab
> thoug this was later improved by others
> https://github.com/YosysHQ/yosys/commit/9d15f1d6ac4a9ff2e1f87cda8c366659027fb76f
> 
> If that's not enough can you point us to what this upstream is doing exactly?


I'm not that particular upstream, but with my upstream hat for other
projects on, there's a lot of data in a Git repository that our
source tarball build process relies on but isn't strictly files in
the checked-out worktree: names and addresses of all commit authors,
most recent tag in the checkout's history and number of commits
following it, presence of certain footer lines in commit messages
after the most recent tag, tags in the checkout's history matching a
specific pattern which come immediately after the introduction of
each file in a certain directory... these are things easily queried
from a (non-shallow) clone of the repository but which aren't simple
string substitutions.

There are several reasons for this complexity: First and foremost,
when these projects started Git was still fairly new on the scene,
and most distros preferred or even required source tarballs for
packaging. Second, the projects' maintainers were burned on multiple
occasions by mistakes where metadata duplicated from Git committed
into the file tree ended up out of sync or straddling release
points, so developed ways to avoid the duplication and risk of
divergence by extracting that data from Git at dist build time.
Third, because the projects needed to deal with heavy volumes of
development activity from many contributors in parallel, they relied
on a distributed parallel approval model with the fewest possible
coordination chokepoints, so needed to support independent features
merging in arbitrary order with things like release notes sorted out
automatically whenever a release got tagged.

I would argue that our source tarballs don't exactly "differ" from
what's in Git; they include content which isn't solely represented
by the worktree files in a corresponding checkout, but is still data
extracted from the corresponding Git repository state. Downstream
distros can choose to use our official signed release source
tarballs, or run the tarball build process themselves from a full
checkout of our Git repositories, but just naively dumping the file
tree from a git checkout or even using a shallow clone is inadequate
and we expressly do not support those workflows (if someone insists
on doing that, it's on them to make it work, and to check that
they're not omitting things the copyright license references such as
a generated authors file).

Unfortunately, package maintainers sometimes like to insist that
upstream projects' workflows are "wrong" because the choices they've
made differ from how other projects might choose to develop
software, but communities are unique and often face different
challenges that need their own solutions or aren't willing to
compromise by adopting partial solutions popular elsewhere just to
conform.
-- 
Jeremy Stanley

signature.asc
Description: PGP signature

Re: Upstreams with "official" tarballs differing from their git

Reply via email to