On Fri, Aug 12, 2016 at 12:31:49PM +0100, Ian Jackson wrote: > Josh Triplett writes ("Re: [ANNOUNCE] git-series: track changes to a patch > series over time"): > > On Thu, Aug 11, 2016 at 01:16:04PM +0100, Ian Jackson wrote: > > > My biggest question therefore is: how does your tool compare to > > > stgit ? Why should we use your tool rather than stgit ? > > > > While stgit does track the history of changes made to the stack, as far > > as I can tell, it doesn't do so in a manner meant for interchange > > between users. stgit works locally for one user, but doesn't seem to > > support multiple users. And the history of the patch stack doesn't > > include commit messages, nor does it group changes into logical commits. > > It seems more like the reflog (a tool to rescue old bits) than a > > historical record. > > I don't understand what distinction there is between multiple users > and multiple development efforts by the same user. Or maybe I don't > understand what you mean by `support multiple users'.
I mean multiple users collaborating on a single patch series in a distributed way, or even a single user developing the patch series and multiple users wanting to consume and view the history of that series, or a single user developing a patch series and using more than one development system to do so. stg can publish/format the patches themselves, but doesn't have a documented format for publishing the history of the patch series. It keeps some internal records of that history, but those records don't have the concept of commit messages, well-defined points of working-ness, or anything you'd expect from a version control history. Certainly not something you'd want to publish. Hence my comparison to reflog. If you did enough digging, you could probably rescue old versions of the patch series, just as you can get rebased-away heads from reflog for a while, but you'd have to do a lot of sifting through whatever random intermediate steps of patch shuffling you might have done. > stg publish seems to be the tool you use for sharing stg branches. Not as far as I can tell. stg publish seems write-only (with no symmetric command to recover the full patch series, let alone its history), and focused exclusively on producing a fast-forwarding branch that has the correct tree, not on preserving the exact state *and* history of the patch series. For instance, if you delete or edit a patch from the series, stg publish (as documented at https://stgit.org/stg-publish.html) "creates a new commit on the public branch having the same tree as the stack but the public head as its parent". How would you turn the result back into a patch series? You'd have to do some manual archaeology, made more complex if you've made any other changes to the series at the same time. And none of that published history (with or without transformation) would work to push to the upstream project, or for the upstream project to "git pull", or even to "git format-patch" into a patch series. > NB I'm not much of an stg user. I used it a long time ago, but I haven't used it in years. > > > My next question is: how do you handle merging of changes made in > > > parallel in different meta-branches of the same series ? I don't mean > > > just aggregating patches, but other common operations such as: > > > reordering of patches; editing patch commit messages (or the cover > > > letter); splitting and merging patches; git rebase --autosquash; etc. > > > > > > I didn't see anything in the docs about this. And I confess I didn't > > > run your code to do any experiments. > > > > git-series does support merge commits within the series branch; see the > > section "git-series commits" in INTERNALS. Right now, git-series > > doesn't create those merge commits for you, but I plan to add a > > mechanism to support that. That'll probably start out as "here's two > > patch series, tell me when you've finished creating the merged version > > and I'll commit it", though I could imagine handling many simple cases > > more automatically. I hope that building a simple tool and > > incrementally improving it will work. > > I think this is the key area of difficulty which stops people sharing > patch series as much as they like, at least as much as the lack of a > fast-forwarding view. Agreed. Having a format to store and interchange patch series history seems like a necessary first step. Making it possible to store merges seems like another. Making it possible to *produce* them would help further. One step at a time, though. :) > > > I did read the INTERNALS document about the data structures. I wonder > > > why you rejected other possibilities. In particular, your top level > > > `git series' branch data structure is not directly useable by any > > > other tool; it needs to be dereferenced/converted, to produce a > > > useable commit. Did you consider recording the metadata as dotfiles > > > in tree objects, or some such ? > > > > I started with a few fundamental constraints: > > - The commits tracked by the series *must* remain directly usable as > > commits in the underlying project, whether by sending patches or by > > pushing/pulling. > > - git must find every object in the history of a series reachable from a > > ref, so that fsck/repack/prune/etc cannot discard series history. > > - Similarly, `git push` and `git fetch` must work on series commits, and > > must transmit/receive the full series history with a series branch, > > without requiring any additional commands or special "series" versions > > of push/fetch. > > > > These constraints limit where metadata can live. Adding any dotfiles to > > the commits in the patch series would mean the resulting patches would > > include those dotfiles. Any metadata added to commit messages would end > > up in patches; note that several projects, including the Linux kernel, > > have complained about patches that include Gerrit "Change-Id" tags. Any > > format that stored patches within a series commit, rather than full > > links to commits for the patches, would not leave the commits themselves > > usable by git. > > The usual approach taken by other patch stack tools is to treat picky > upstreams, like people who object to Change-Id, as an output format. > > Those picky upstreams are likely to rewrite (or reapply) a series, so > what ends up in the upstream tree won't be the same commit objects > (and perhaps not the same tree objects) anyway. As mentioned above, I specifically started from the constraint that the series tracks commits, not just trees; the commits themselves, complete with their commit IDs, represent one of the artifacts the series tracks untouched. While email workflows effectively turn into "git am" which amounts to a rebase (though `git format-patch` has started to fix that with the introduction of base-commit metadata), "please pull" workflows typically involve merges of your actual commits. (That includes "git request-pull", "git series req", or a GitHub/GitLab/etc pull request.) > > Do you see another possible storage format that meets all the > > constraints above? > > Well, there is the obvious "pseudo-merge" convention: each patch > series tip is, when published, merged with -s ours with the previous > published version. > > You do have to strip the pseudo-merge before starting work with > git-rebase, and then reapply it afterwards, but that is not > particularly difficult (and some tooling would help). Not just git-rebase; that approach requires using special tools around any git command that operates on the actual patch series. You can't just run "git cherry-pick", "git rebase", "git rebase -i", "git commit --amend", "git am", or even just "git commit" without first un-applying the merge commit. Analysis tools like "git bisect" or "git blame" will also find themselves unhappy.) Pulling the commit out of a pseudo-merge doesn't seem any less or more difficult than pulling out the "series" entry from a git-series commit; either way, you have a wrapping commit for metadata from which you need to extract the underlying commit you want. And the "pseudo-merge" convention doesn't track a cover letter, the base of a series, or any other metadata. In addition, a "pseudo-merge" encodes non-trivial metadata into the parent list of commits, making it more difficult to handle things like merges between meta-commits, or conversely the history of commits that themselves include merges. How can you follow the history of such a patch series, and tell the difference between meta-commits and commits? Some of those problems seem fixable; you could define a precise format based on pseudo-merge commits, including all the same metadata, a precise definition for which parents refer to other pseudo-merges and which ones refer to versions of the series, and so on. I don't see the advantage of such a format, though. > I intend to provide some tooling support this workflow, because I > think this workflow would work well with dgit. It produces a > fast-forwarding branch containing the intended output tree objects. "tree objects" alone don't suffice. They provide enough information to extract a source tree, but then, so do archived source tarballs. That doesn't suffice for collaborative development processes anymore; for that, you need commit objects. > Series cover letters are less important for Debian so I haven't > thought about that much but the obvious answer is to have an "empty" > commit at the base of the stack. Git really despises empty commits and does its best to destroy them at every turn. (This partly comes from trying to drop already-applied commits when rebasing on a newer upstream.) I would not recommend putting any data you value into an empty commit. In addition, this would make the patch series completely unusable with a "git pull" or "git push" workflow, since upstream would not want that empty commit in its history; you could *only* send such a patch series via email. > > For repositories, you can push the series branch directly if you want to > > provide the history of your series, or you can push the current version > > (or an older version) of the patch series if you just want to publish > > that version. > > Neither of these is compatible with dgit, of course. The former seems the easiest to interoperate with: dgit could easily learn to receive a series commit and turn it into a source package ready for upload. In addition, the git-series format has the advantage of chaining back to the upstream git history; the "base" for each series commit would refer to an upstream commit, and then the "series" provides a patch series on top of that. Most git repositories for packaging end up using unusual mechanisms to interoperate with upstream version history, or they don't do so at all.