> Am I correct to understand that the reason why a commit object is > (ab|re)used to represent a meta-commit is because by doing so we > would get connectivity (i.e. fetching & pushing would transfer all > the associated objects along) for free, and by not representing it > as a new and different object type, existing implementations can > just pass them along without understanding what they are, and as > long as these are not mixed as parts of the main history of the > project (e.g. when enumerating commits that has aa7ce5 as its > parents, because somebody else obsoleted aa7ce5 and you want to > evolve anything that built on it, you do not want to mistake the > above "meta" commit as a commit that is part of the ordinary history > and rebuild on top of the new version of aa7ce5, which would lead to > a disaster), everything would work just fine?
Yes, sir. That's it exactly. My first draft of the proposal suggested creating a new top-level object type, but when I started digging through the code I realized that the new object was so similar to a commit that there was no need for a new type. > Perhaps you'd use something like "presence of parent-type header > marks that a commit is a meta-commit and not part of the main > history". Yes, that's called out explicitly as part of the proposal (see the first sentence in the Parent-type subsection). Fsck would enforce this invariant. > How are these meta commits anchored so that it won't be reclaimed by > repack? They would either be anchored by a ref in the metas/ namespace (for active changes currently under consideration by evolve) or by the reflog (for recently deleted changes). > I do not see any "parent" field used to chain them together, They point to one another using the usual "parent" field present in all commit objects. For an example of what the raw struct would look like with parent pointers, see the top of the "Detailed design" section or search the doc for the string <example_meta_commit>. For examples of how the metacommits in a change graph would be connected after various operations, see the "Commit" section and the "Merge" section. Please let me know if any of these examples are insufficiently explained or if there's any other examples you'd like to see. > but I do not think we can afford to spend one ref per meta > commit, as refs are not designed to point into each and every object > in the repository. Agreed. This is actually one of the reasons I'm proposing the use of chains of meta-commits as opposed to using a purely ref-based approach. I describe several other ref-based approaches in the "Other options considered" section, and I made essentially the same point there. We only create refs in the metas/ namespace to point to the head of each change, and the rest of the commits and metacommits used by the graph are reached via the parent pointers in the metacommits. > I have a moderately strong opposition against "origin" thing. If > aa7ce555 replaces d664309ee, in order for the tool to be able to > "evolve" other histories that build on top of d664309ee, it only > needs the history between aa7ce555 and d664309ee and it would not > matter how aa7ce555 was built relative to its parent. I see I haven't justified the "origin" thing well enough. I'll elaborate in the document, but here's the short version. The "origin" edges are needed to address several use-cases: 1. Gerrit needs to know about cherry picks. This is one of the lesser-known things that it uses the change-id footers for and if we want to be able to eliminate the gerrit change-id footers we need to record and communicate information about cherry-picks somehow. This is the main reason for the origin edges - the early drafts of this proposal didn't have them but it came up when I asked a kind Gerrit maintainer to whether the proposal would be sufficient to eliminate gerrit's change-ids. However there may be alternatives I didn't think of. If we were to omit the origin edges, can you suggest an alternative way for git to record the fact that one commit was cherry-picked from another and communicate this fact to gerrit? I see that I forgot to call out "replacing gerrit change-ids" as an explicit goal. I'll add that to the doc. 2. Obsolescence across cherry-picks. In your example, it *may* actually matter how aa7ce55 was constructed. One such scenario is what I'm calling obsolescence across cherry-picks. Let me describe the use-case for it: Alice creates commit A1. Bob cherry-picks A1 to another branch, producing B1. At this point, Bob has a metacommit saying that A1 is the origin of B1. Alice amends A1, producing A2. She shares this with Bob. At this point, Bob probably wants to amend B1 to include whatever bugfix Alice did in A2 since the thing he cherry-picked is now out of date. That's what the obsolescence across cherry-picks feature does. If bob runs evolve with this option enabled, the evolve command will produce B2 by amending B1 with whatever diff Alice did between A1 and A2... and this only works if we have origin edges. Without the origin edge, the evolve command wouldn't know that B1 came from the now out-of-date A1. The commit B2 that results from this would have both origin and replacement edges. It replaces B1 but it was formed by cherry-picking A2. I'm currently unsure if obsolescence across cherry-picks should be on or off by default. I was thinking of making it off-by-default initially and then possibly flipping the default after users have a chance to try it and give feedback. 3. Merge-base. Origin edges will always point to a better merge base (ancestor for three-way merges) than the content's parent. For example, consider doing a cherry-pick followed by a rebase: $ git checkout -b source # Stage some stuff $ git commit -m "A" && git tag A # Stage some more stuff $ git commit --amend -m "B" && git tag B $ git checkout -b dest && git reset --hard HEAD^1 $ git cherry-pick A $ git checkout source $ git rebase dest We cherry-picked an old version of a change, and then rebased a new version of that same change onto the a branch containing the old one. With today's code we'd pick A's parent commit as the common ancestor and the user would have to resolve a bunch of merge conflicts since both A and B are versions of the same patch and touch a lot of the same lines. We know that B is the newer version but the merge tool doesn't know that B is newer than A. If we had origin edges, we would know that the latest commit on the dest branch was a cherry-pick of A and would traverse its origin edge before the content edge to look for ancestors. That would select A as the common ancestor, and since B applies cleanly on top of A there would be no conflicts and the automerge would most likely succeed. Now, this may seem like a crazy thing to do. Why would you rebase two different versions of the same change on top of one another? This scenario is likely to crop up when users start using the change graph to collaborate on the same WIP change. They'll be using rebase, merge, and cherry-pick to resolve divergence and incorporate changes from other users... so rebases and cherry-picks of different versions of the same change would be commonplace. > The user may > have typed/developed it from scratch, the user may have borrowed 70% > of its change from 7e1bbcd while remaining 30% was done from > scratch, or it was a concatenation of the change made in 7e1bbcd and > another commit. I don't think the amount they developed from scratch invalidates any of the three main use-cases (above). Even if we have no idea how many manual edits were made, the origin parents would still be useful for communication with gerrit, locating a better merge base, and giving the user the option of obsolescence across cherry-picks. The proposal calls for commands like squash merge to create multiple origin parents, so concatenations *would* be recorded accurately if there was ever a reason to treat them differently (I'm not sure any of my 3 main use-cases need to). > One half of my point being that we can do _without_ it, and in all > cases, aa7ce555, if leaving the fact that it was derived from > 7e1bbcd is so important, can mention that in its log message how it > relates to the "origin" thing. Log messages would be sufficient for communicating cherry-picks to the user, but wouldn't address any of the driving use-cases for origin parents which require it to be machine-readable. Now, admittedly the obsolescence-across-cherry-picks and merge-base use-cases are minor features due to the fact that cherry-picks and squash merges are themselves uncommon. A lot of users would probably never notice the difference. However, it would be very disappointing if gerrit's change-ids needed to stick around just for the sake of this one missing corner case. > And the other half is that while I consider the "origin" thing is > unnecessary for the above reasons, having it means we need to not > just transfer the history reading to aa7ce555 and d664309ee (which > are necessary anyway while we have histories to transplant from > d664309ee to aa7ce555) but also have to pull in the history leading > to 7e1bbcd and we cannot discard it. I'll assume that by "history" you're referring to the change graph (the metacommits) and not the branches (the commits), which would have no origin edges or connection between replacements. If the user has kept a change around in their metas namespace, it's an indication that they (or their collaborators) are still working on it and want its history to be retained. I don't necessarily see this as a problem because if collaborators are still editing a change that the local user cherry-picked, it's plausible that the change may be the subject of a future obsolescence-over-cherry-pick in which case having the history around is necessary. You're right that our default position should be not to retain extra objects unless there's a compelling reason to do so, and this proposal should have explained that reason. Now that I've explained the reason do you still have a strong objection to the "origin" parents, or have I overlooked a use-case? - Stefan