Hi Colin, Hi Debian,

On Sat, Feb 14, 2026 at 07:07:10PM +0000, Colin Watson wrote:
On Fri, Feb 13, 2026 at 06:43:37PM +0100, Marc Haber wrote:
The monk in me would like to rebuild the full repository that way, retroactivly joining the past release tags that are now in the repository with the points in the past when gbp import-orig --pristine-tar /path/to/tarball was called on a release tarball.

Has this ever been done, or is it just too much work?

It's certainly possible to do that sort of repository surgery, at the cost of changing commit IDs for everything after the first point in history where you add an additional parent, and so meaning anyone who already has your repository checked out would have to force-pull or re-clone. This will necessarily include rewriting all your existing tags.

That didn't leave me alone and I spent the weekend with git, and aide. aide is one my my oldest packages, reasonably simple but still having two decades of development history in git (obviously started in CVS, got migrated to svn eventually, and then finally to git).

tl;dr: While desireable at a first glance, I don't think that keeping a close connection between Debian and upstream will scale
over the years.

I'm going to share my experiences anyway. See my result on https://salsa.debian.org/zugschlus/aide-rebuild Just for the fun of it, just pull everything including tags and then try git log --oneline --graph --all or gitk --all --tags. Frankly, I'm actually surprised that gitk doesn't crash right away on this repo. Au contraire, it actually performs decently while displaying this chaos of back-and-forward merging.

I would like to hear your opinion and comments what could be any better.

The commit IDs of the upstream repository have not been changed. The Upstream tags are aligned with the upstream tarballs that were pulled from The Archive, so are the Debian release tags aligned with the .dsc files in The Archive. From that, I did not only rewrite history, I regenerated it. Almost everything else has changed. For historic documentation and autiting, I think that the -legacy branches and tags should be left around.

Sadly, just running git log on debian/latest (which is probably the single most important branch) only shows merge commits now. You need git log --graph to see more. I hope that when development is continued from here, plain git log history is going to be a bit more informative. That also means that a repository that hadn’t been converted but continued to grow through daily use would only be half as messy. I still think that this method of repository operation will not scale if one looked at periods of decades. It is nice and pretty when starting new, and only look at the first handful of upstream Releases and one, two debian releases per upstrem release.

But once you begin packaging development snapshots, upstream maintenance releases and do you own detours through experimental, the mess begins.

While I have learned quite a lot about git and merging in the last days, I am still far away from claiming that I understand those things yet.

I roughly followed the algorithm I outlined twice: Once manually on saturday, then I decided to throw everything away because I didn't follow my own algorithm reliably, wrote some primitive tooling and spent the sunday with doing things again, this time with software support. The second time wasn't considerably quicker.

And sadly, I realized too late that git-buildpackage probably offers a couple of nice python functions that already encapsulates and abstracts git, making my tooling a bit more readable. But for a one-off proof of concept it was enough.

== Preparation

First thing I had to do was renaming the existing debian/, upstream/ and pristine-tar branches to debian-legacy, upstream-legacy and pristine-tar-legacy and building a timeline of the package from the release history in text file. I also pulled all available package versions from snapshot.debian.org (thanks, debsnap) and decided that The Archive should be the primary source of truth.

== Handling of early history and probably the big mistake

Since the Debian vcs history goes back further than upstream's, I
started a new upstream/early branch with the beginning of debian-legacy/latest and used that one as upstream branch for the time until upstream's history begins. upstream/latest was branched off upstream's first commit.

Since the early commits of Debian's repository cheerfully mix up Debian and upstream commits, this is not clinically clean. At the appropriate points for the first two upstream releases that we only have as upstrem tarball and not as history, I did gbp import-orig and made sure that upstream/early had the same tree than the tarballs for the "upstream release tags". The debian commits added later added some Debian stuff again which clutters up upstream/early up to the point where upstream's history begins.

I then merged upstream/early into upstream/latest (which at this time only contained upstream's first commit), building the connection between ancient and old upstream history. I then threw away all of the merged content, deleted all the files from the branch, checked out all the files from the first upstream commit and committed them to upstream/latest, keeping the formal connectmon to the early history from the Debian repository while starting with upstream's initial tree.

Of course, this made upstream/latest's commit IDs diverge from upstream's master branch. While a normal upstream/latest of sources that didn't need repacking for Debian purposes directly shows the upstream commits in its history, this upstream/latest is only made of merge commits, not explicitly showing the upstream history but rather the points where debian packaged an upstream release. gitk upstream/latest and git log --graph upstream/latest show the two timelines and the points where they intersect nicely though.

If I had made the decision of not needing the connection to ths first 50 commits from Jul 1999 to Jun 2003, upstream/latest would show a clearer connection to the following 22 years of upstream development

I had to put some additional tags on the upstream repository, especially when upstream snapshots were packaged. I put them in my own namespace (upstream-snapshot/debian-upstream-version) while choosing the debian-upstream-version according to what was chosen in the past to generate the "upstream" tarball.

= Before git and gbp

Things got a bit more mechanized when my rebuilding the package reached the point where I had an upstream history without Debian content, a continuing history of Debian commits with occasional upsteam merges, and the complete source archives from snapshot.d.o.

Things were even more easier when I reached the point when I started using git-buildpackage myself.

The workflow that established itself was like:

New upstream release:
  - make sure that upstream release tag is at the right place
  - merge upstream release tag into upstream/latest¹
  - switch back to debian/latest
  - use gbp import-orig --pristine-tar --upstream-vcs-tag aide_*.orig.tar.gz
- this also sets the upstream/<version> tag and merges the upstream code into debian/latest.

aide's upstream had a mainly linear development history in the beginning. Things got more more complicated when they started branching off a stable maintenance release and release 0.x.y from there while development continues

gbp import-orig nicely took care of the differences between upstream git and the release tarball. This would also have been the step where the xz-utils attack would have happened and found its way into Debian git. The next merge from git would probbaly have removed the release artifacts again, wouldnt it?

New Debian version:
  - make sure that debian-legacy release tag is at the right place
  - merge debian-legacy release tag in debian/latest
    - alternatively it would probably have been possible to
git rebase --onto debian/latest debian-legacy/lastrelease..debian-legacy/thisrelease, but that would have been at the expense of not having a connection between the old commit history and the new one, needing more audits.
  - make sure that debian-legacy/thisrelease and debian/latest match
    (this is probably paranoid)
- use gbp import-dsc --pristine-tar to import the dsc (and create the debian tag). - make sure that debian/latest now matches the source package (extract with dpkg-source -x --skip-patches).

== Packaging development snapshots and maintenance branches

The same procedure also applies to a packaged development snapshot, only that the "upstream" release tag must be set by yourself. I would advise future package maintainers to be more intelligent when synthesizing their version numbers for upstream development snapshots. I ended up stupidly packaging 0.18.1 (from the 0.18.x maintenance branch), then pulling a development snapshot (from master), calling it 0.18.1.1.yyyymmdd, and then pulling 0.18.2 from the maintenance branch again. Caling the development snapshot something along the line of 0.19~devyyyymmdd would have been wiser.

I eventually decided to backtrack in my upstream/latest branch and branch off an upstream/0.18.x branch, going along with the branch scheme that upstream used, while trying to keep a linear history in debian/latest even when some packages went to experimental instead of unstable. It would probably be a good idea to have debian/latest stay on the last unstable upload, branching off debian/exp-yyyymmdd, doing experimental stuff there and then either merge debian/exp-yyyymmdd back into debian/latest when the experimental changes go to unstable (this is also a nice reminder to include the experimental parts of the changelog when doing the unstable upload). Wodering how dgit/tag2upload will handle those cases?

One of the biggest learning is to stay up to the way upstream works once you start packaging things that they didn't formally release.

And we need more docs about Debian package repository managament for the more complicated cases. We have enough (contradicting!) docs about bringing a new thing into Debian, not we need some docs that help you taking something that already exists and bringing it up to shape.

Congratulations if you have read up to this point. This is way too long. But it was also a weekend worth of concentrated work.

I've only done this sort of thing myself when I was changing revision control systems anyway, and I hope that's all in the past for me now. However, I think "git filter-repo" (in the separate git-filter-repo package) could probably do most of it for you, since I don't think you want the trees associated with any of these commits to change; you just want to graft an extra parent onto each of the commits on your current upstream/latest branch and then rewrite all their descendants to match. See the two sections in git-filter-repo(1) headed "Parent rewriting", and do this in a separate clone of your repository and read at least the "DISCUSSION" section before doing anything. I would definitely not do it by manually re-committing everything as you suggest, even in some kind of script - there are much better tools than that available.

Sadly, that was too late. I think that the merge orgy was vastly more educative.
I'm not personally sure that it's worth the effort and consequences though.

Neither am I. I am still glad I did that since I am not so sure any more whether keeping upstrem history in Debian git while trying to build a connection between the two will scale over decades, or whether it will just make our repositories incomprehensible as the years pass.

Greetings
Marc

P.S.: I hate temporal mechanics.

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

Reply via email to