> Am I correct to understand that the reason why a commit object is
> (ab|re)used to represent a meta-commit is because by doing so we
> would get connectivity (i.e. fetching & pushing would transfer all
> the associated objects along) for free, and by not representing it
> as a new and different object type, existing implementations can
> just pass them along without understanding what they are, and as
> long as these are not mixed as parts of the main history of the
> project (e.g. when enumerating commits that has aa7ce5 as its
> parents, because somebody else obsoleted aa7ce5 and you want to
> evolve anything that built on it, you do not want to mistake the
> above "meta" commit as a commit that is part of the ordinary history
> and rebuild on top of the new version of aa7ce5, which would lead to
> a disaster), everything would work just fine?

Yes, sir. That's it exactly. My first draft of the proposal suggested
creating a new top-level object type, but when I started digging
through the code I realized that the new object was so similar to a
commit that there was no need for a new type.

> Perhaps you'd use something like "presence of parent-type header
> marks that a commit is a meta-commit and not part of the main
> history".

Yes, that's called out explicitly as part of the proposal (see the
first sentence in the Parent-type subsection). Fsck would enforce this
invariant.

> How are these meta commits anchored so that it won't be reclaimed by
> repack?

They would either be anchored by a ref in the metas/ namespace (for
active changes currently under consideration by evolve) or by the
reflog (for recently deleted changes).

> I do not see any "parent" field used to chain them together,

They point to one another using the usual "parent" field present in
all commit objects. For an example of what the raw struct would look
like with parent pointers, see the top of the "Detailed design"
section or search the doc for the string <example_meta_commit>. For
examples of how the metacommits in a change graph would be connected
after various operations, see the "Commit" section and the "Merge"
section. Please let me know if any of these examples are
insufficiently explained or if there's any other examples you'd like
to see.

> but I do not think we can afford to spend one ref per meta
> commit, as refs are not designed to point into each and every object
> in the repository.

Agreed. This is actually one of the reasons I'm proposing the use of
chains of meta-commits as opposed to using a purely ref-based
approach. I describe several other ref-based approaches in the "Other
options considered" section, and I made essentially the same point
there. We only create refs in the metas/ namespace to point to the
head of each change, and the rest of the commits and metacommits used
by the graph are reached via the parent pointers in the metacommits.

> I have a moderately strong opposition against "origin" thing.  If
> aa7ce555 replaces d664309ee, in order for the tool to be able to
> "evolve" other histories that build on top of d664309ee, it only
> needs the history between aa7ce555 and d664309ee and it would not
> matter how aa7ce555 was built relative to its parent.

I see I haven't justified the "origin" thing well enough. I'll
elaborate in the document, but here's the short version. The "origin"
edges are needed to address several use-cases:

1. Gerrit needs to know about cherry picks.

This is one of the lesser-known things that it uses the change-id
footers for and if we want to be able to eliminate the gerrit
change-id footers we need to record and communicate information about
cherry-picks somehow. This is the main reason for the origin edges -
the early drafts of this proposal didn't have them but it came up when
I asked a kind Gerrit maintainer to whether the proposal would be
sufficient to eliminate gerrit's change-ids. However there may be
alternatives I didn't think of. If we were to omit the origin edges,
can you suggest an alternative way for git to record the fact that one
commit was cherry-picked from another and communicate this fact to
gerrit?

I see that I forgot to call out "replacing gerrit change-ids" as an
explicit goal. I'll add that to the doc.

2. Obsolescence across cherry-picks.

In your example, it *may* actually matter how aa7ce55 was constructed.
One such scenario is what I'm calling obsolescence across
cherry-picks. Let me describe the use-case for it:

Alice creates commit A1.

Bob cherry-picks A1 to another branch, producing B1. At this point,
Bob has a metacommit saying that A1 is the origin of B1.

Alice amends A1, producing A2. She shares this with Bob.

At this point, Bob probably wants to amend B1 to include whatever
bugfix Alice did in A2 since the thing he cherry-picked is now out of
date. That's what the obsolescence across cherry-picks feature does.
If bob runs evolve with this option enabled, the evolve command will
produce B2 by amending B1 with whatever diff Alice did between A1 and
A2... and this only works if we have origin edges. Without the origin
edge, the evolve command wouldn't know that B1 came from the now
out-of-date A1. The commit B2 that results from this would have both
origin and replacement edges. It replaces B1 but it was formed by
cherry-picking A2.

I'm currently unsure if obsolescence across cherry-picks should be on
or off by default. I was thinking of making it off-by-default
initially and then possibly flipping the default after users have a
chance to try it and give feedback.

3. Merge-base.

Origin edges will always point to a better merge base (ancestor for
three-way merges) than the content's parent. For example, consider
doing a cherry-pick followed by a rebase:

$ git checkout -b source
# Stage some stuff
$ git commit -m "A" && git tag A
# Stage some more stuff
$ git commit --amend -m "B" && git tag B
$ git checkout -b dest && git reset --hard HEAD^1
$ git cherry-pick A
$ git checkout source
$ git rebase dest

We cherry-picked an old version of a change, and then rebased a new
version of that same change onto the a branch containing the old one.
With today's code we'd pick A's parent commit as the common ancestor
and the user would have to resolve a bunch of merge conflicts since
both A and B are versions of the same patch and touch a lot of the
same lines. We know that B is the newer version but the merge tool
doesn't know that B is newer than A. If we had origin edges, we would
know that the latest commit on the dest branch was a cherry-pick of A
and would traverse its origin edge before the content edge to look for
ancestors. That would select A as the common ancestor, and since B
applies cleanly on top of A there would be no conflicts and the
automerge would most likely succeed.

Now, this may seem like a crazy thing to do. Why would you rebase two
different versions of the same change on top of one another? This
scenario is likely to crop up when users start using the change graph
to collaborate on the same WIP change. They'll be using rebase, merge,
and cherry-pick to resolve divergence and incorporate changes from
other users... so rebases and cherry-picks of different versions of
the same change would be commonplace.

> The user may
> have typed/developed it from scratch, the user may have borrowed 70%
> of its change from 7e1bbcd while remaining 30% was done from
> scratch, or it was a concatenation of the change made in 7e1bbcd and
> another commit.

I don't think the amount they developed from scratch invalidates any
of the three main use-cases (above). Even if we have no idea how many
manual edits were made, the origin parents would still be useful for
communication with gerrit, locating a better merge base, and giving
the user the option of obsolescence across cherry-picks. The proposal
calls for commands like squash merge to create multiple origin
parents, so concatenations *would* be recorded accurately if there was
ever a reason to treat them differently (I'm not sure any of my 3 main
use-cases need to).

> One half of my point being that we can do _without_ it, and in all
> cases, aa7ce555, if leaving the fact that it was derived from
> 7e1bbcd is so important, can mention that in its log message how it
> relates to the "origin" thing.

Log messages would be sufficient for communicating cherry-picks to the
user, but wouldn't address any of the driving use-cases for origin
parents which require it to be machine-readable.

Now, admittedly the obsolescence-across-cherry-picks and merge-base
use-cases are minor features due to the fact that cherry-picks and
squash merges are themselves uncommon. A lot of users would probably
never notice the difference. However, it would be very disappointing
if gerrit's change-ids needed to stick around just for the sake of
this one missing corner case.

> And the other half is that while I consider the "origin" thing is
> unnecessary for the above reasons, having it means we need to not
> just transfer the history reading to aa7ce555 and d664309ee (which
> are necessary anyway while we have histories to transplant from
> d664309ee to aa7ce555) but also have to pull in the history leading
> to 7e1bbcd and we cannot discard it.

I'll assume that by "history" you're referring to the change graph
(the metacommits) and not the branches (the commits), which would have
no origin edges or connection between replacements.

If the user has kept a change around in their metas namespace, it's an
indication that they (or their collaborators) are still working on it
and want its history to be retained. I don't necessarily see this as a
problem because if collaborators are still editing a change that the
local user cherry-picked, it's plausible that the change may be the
subject of a future obsolescence-over-cherry-pick in which case having
the history around is necessary.

You're right that our default position should be not to retain extra
objects unless there's a compelling reason to do so, and this proposal
should have explained that reason. Now that I've explained the reason
do you still have a strong objection to the "origin" parents, or have
I overlooked a use-case?

  - Stefan

Reply via email to