Hi Colin, Hi Debian,
On Sat, Feb 14, 2026 at 07:07:10PM +0000, Colin Watson wrote:
On Fri, Feb 13, 2026 at 06:43:37PM +0100, Marc Haber wrote:
The monk in me would like to rebuild the full repository that way,
retroactivly joining the past release tags that are now in the
repository with the points in the past when gbp import-orig
--pristine-tar /path/to/tarball was called on a release tarball.
Has this ever been done, or is it just too much work?
It's certainly possible to do that sort of repository surgery, at the
cost of changing commit IDs for everything after the first point in
history where you add an additional parent, and so meaning anyone who
already has your repository checked out would have to force-pull or
re-clone. This will necessarily include rewriting all your existing
tags.
That didn't leave me alone and I spent the weekend with git, and aide.
aide is one my my oldest packages, reasonably simple but still having
two decades of development history in git (obviously started in CVS, got
migrated to svn eventually, and then finally to git).
tl;dr: While desireable at a first glance, I don't think that keeping a
close connection between Debian and upstream will scale
over the years.
I'm going to share my experiences anyway. See my result on
https://salsa.debian.org/zugschlus/aide-rebuild
Just for the fun of it, just pull everything including tags and then try
git log --oneline --graph --all or gitk --all --tags. Frankly, I'm
actually surprised that gitk doesn't crash right away on this repo. Au
contraire, it actually performs decently while displaying this chaos of
back-and-forward merging.
I would like to hear your opinion and comments what could be any better.
The commit IDs of the upstream repository have not been changed. The
Upstream tags are aligned with the upstream tarballs that were pulled
from The Archive, so are the Debian release tags aligned with the .dsc
files in The Archive. From that, I did not only rewrite history, I
regenerated it. Almost everything else has changed. For historic
documentation and autiting, I think that the -legacy branches and tags
should be left around.
Sadly, just running git log on debian/latest (which is probably the
single most important branch) only shows merge commits now. You need git
log --graph to see more. I hope that when development is continued from
here, plain git log history is going to be a bit more informative.
That also means that a repository that hadn’t been converted but
continued to grow through daily use would only be half as messy. I still
think that this method of repository operation will not scale if one
looked at periods of decades. It is nice and pretty when starting new,
and only look at the first handful of upstream Releases and one, two
debian releases per upstrem release.
But once you begin packaging development snapshots, upstream maintenance
releases and do you own detours through experimental, the mess begins.
While I have learned quite a lot about git and merging in the last days,
I am still far away from claiming that I understand those things yet.
I roughly followed the algorithm I outlined twice: Once manually on
saturday, then I decided to throw everything away because I didn't
follow my own algorithm reliably, wrote some primitive tooling and spent
the sunday with doing things again, this time with software support. The
second time wasn't considerably quicker.
And sadly, I realized too late that git-buildpackage probably offers a
couple of nice python functions that already encapsulates and abstracts
git, making my tooling a bit more readable. But for a one-off proof of
concept it was enough.
== Preparation
First thing I had to do was renaming the existing debian/, upstream/ and
pristine-tar branches to debian-legacy, upstream-legacy and
pristine-tar-legacy and building a timeline of the package from the
release history in text file. I also pulled all available package
versions from snapshot.debian.org (thanks, debsnap) and decided that The
Archive should be the primary source of truth.
== Handling of early history and probably the big mistake
Since the Debian vcs history goes back further than upstream's, I
started a new upstream/early branch with the beginning of
debian-legacy/latest and used that one as upstream branch for the time
until upstream's history begins. upstream/latest was branched off
upstream's first commit.
Since the early commits of Debian's repository cheerfully mix up Debian
and upstream commits, this is not clinically clean. At the appropriate
points for the first two upstream releases that we only have as upstrem
tarball and not as history, I did gbp import-orig and made sure that
upstream/early had the same tree than the tarballs for the "upstream
release tags". The debian commits added later added some Debian stuff
again which clutters up upstream/early up to the point where upstream's
history begins.
I then merged upstream/early into upstream/latest (which at this time
only contained upstream's first commit), building the connection between
ancient and old upstream history. I then threw away all of the merged
content, deleted all the files from the branch, checked out all the
files from the first upstream commit and committed them to
upstream/latest, keeping the formal connectmon to the early history from
the Debian repository while starting with upstream's initial tree.
Of course, this made upstream/latest's commit IDs diverge from
upstream's master branch. While a normal upstream/latest of sources that
didn't need repacking for Debian purposes directly shows the upstream
commits in its history, this upstream/latest is only made of merge
commits, not explicitly showing the upstream history but rather the
points where debian packaged an upstream release. gitk upstream/latest
and git log --graph upstream/latest show the two timelines and the
points where they intersect nicely though.
If I had made the decision of not needing the connection to ths first 50
commits from Jul 1999 to Jun 2003, upstream/latest would show a clearer
connection to the following 22 years of upstream development
I had to put some additional tags on the upstream repository, especially
when upstream snapshots were packaged. I put them in my own namespace
(upstream-snapshot/debian-upstream-version) while choosing the
debian-upstream-version according to what was chosen in the past to
generate the "upstream" tarball.
= Before git and gbp
Things got a bit more mechanized when my rebuilding the package reached
the point where I had an upstream history without Debian content, a
continuing history of Debian commits with occasional upsteam merges, and
the complete source archives from snapshot.d.o.
Things were even more easier when I reached the point when I started
using git-buildpackage myself.
The workflow that established itself was like:
New upstream release:
- make sure that upstream release tag is at the right place
- merge upstream release tag into upstream/latest¹
- switch back to debian/latest
- use gbp import-orig --pristine-tar --upstream-vcs-tag aide_*.orig.tar.gz
- this also sets the upstream/<version> tag and merges the upstream
code into debian/latest.
aide's upstream had a mainly linear development history in the
beginning. Things got more more complicated when they started branching
off a stable maintenance release and release 0.x.y from there while
development continues
gbp import-orig nicely took care of the differences between upstream git
and the release tarball. This would also have been the step where the
xz-utils attack would have happened and found its way into Debian git.
The next merge from git would probbaly have removed the release
artifacts again, wouldnt it?
New Debian version:
- make sure that debian-legacy release tag is at the right place
- merge debian-legacy release tag in debian/latest
- alternatively it would probably have been possible to
git rebase --onto debian/latest
debian-legacy/lastrelease..debian-legacy/thisrelease, but that
would have been at the expense of not having a connection between
the old commit history and the new one, needing more audits.
- make sure that debian-legacy/thisrelease and debian/latest match
(this is probably paranoid)
- use gbp import-dsc --pristine-tar to import the dsc (and create the
debian tag).
- make sure that debian/latest now matches the source package (extract
with dpkg-source -x --skip-patches).
== Packaging development snapshots and maintenance branches
The same procedure also applies to a packaged development snapshot, only
that the "upstream" release tag must be set by yourself. I would advise
future package maintainers to be more intelligent when synthesizing
their version numbers for upstream development snapshots. I ended up
stupidly packaging 0.18.1 (from the 0.18.x maintenance branch), then
pulling a development snapshot (from master), calling it
0.18.1.1.yyyymmdd, and then pulling 0.18.2 from the maintenance branch
again. Caling the development snapshot something along the line of
0.19~devyyyymmdd would have been wiser.
I eventually decided to backtrack in my upstream/latest branch and
branch off an upstream/0.18.x branch, going along with the branch scheme
that upstream used, while trying to keep a linear history in
debian/latest even when some packages went to experimental instead of
unstable. It would probably be a good idea to have debian/latest stay on
the last unstable upload, branching off debian/exp-yyyymmdd, doing
experimental stuff there and then either merge debian/exp-yyyymmdd back
into debian/latest when the experimental changes go to unstable (this is
also a nice reminder to include the experimental parts of the changelog
when doing the unstable upload). Wodering how dgit/tag2upload will
handle those cases?
One of the biggest learning is to stay up to the way upstream works once
you start packaging things that they didn't formally release.
And we need more docs about Debian package repository managament for the
more complicated cases. We have enough (contradicting!) docs about
bringing a new thing into Debian, not we need some docs that help you
taking something that already exists and bringing it up to shape.
Congratulations if you have read up to this point. This is way too long.
But it was also a weekend worth of concentrated work.
I've only done this sort of thing myself when I was changing revision
control systems anyway, and I hope that's all in the past for me now.
However, I think "git filter-repo" (in the separate git-filter-repo
package) could probably do most of it for you, since I don't think you
want the trees associated with any of these commits to change; you
just want to graft an extra parent onto each of the commits on your
current upstream/latest branch and then rewrite all their descendants
to match. See the two sections in git-filter-repo(1) headed "Parent
rewriting", and do this in a separate clone of your repository and
read at least the "DISCUSSION" section before doing anything. I would
definitely not do it by manually re-committing everything as you
suggest, even in some kind of script - there are much better tools
than that available.
Sadly, that was too late. I think that the merge orgy was vastly more
educative.
I'm not personally sure that it's worth the effort and consequences
though.
Neither am I. I am still glad I did that since I am not so sure any more
whether keeping upstrem history in Debian git while trying to build a
connection between the two will scale over decades, or whether it will
just make our repositories incomprehensible as the years pass.
Greetings
Marc
P.S.: I hate temporal mechanics.
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421