Subtree mergeinfo -- what I learnt at the Hackathon

Julian Foad Wed, 20 Jun 2012 08:00:13 -0700

Warning: long email.  Merge gurus and enthusiasts please comment!

At the Hackathon last week, the biggest single topic I discussed with others 
was the handling of subtree merges: both the current handling (which Paul told 
me a lot about), and how we could model them in a more pure/general/consistent 
way in future (which Brane helped me think about).


This is my understanding.


SUMMARY

The current handling is that a normal 'sync' merge will merge any eligible 
revisions from each subtree.  It uses relative-path equivalence as the 
definition of the 'corresponding' subtree.

For the future, it may be better to match subtrees in one branch of the merge 
to subtrees in the other branch, with the correspondence determined by 
following their own lines of history, including renames.  The merging of a 
subtree would then be consistent with what would currently happen if you choose 
to merge that subtree by itself, as the root of a merge -- that is, its 
location history *would* be followed.

We need to decide about how the 'Symmetric merge' should handle subtrees, 
initially.  This should be made in the light of both the need for backward 
compatibility and thought about where the merge capability is headed.


CURRENT HANDLING OF SUBTREE MERGEINFO

An ordinary 'sync' merge that considers mergeinfo will consider mergeinfo on 
subtrees as well as mergeinfo on the merge target root node.  For each target 
subtree, it will merge any 'eligible' revisions from the corresponding subtree 
in the source.

  Example:
    with a merge-root node (named A in the source, B in the target,
    and O at the YCA), and a subtree (named 'foo' at all times):

  A         [x] --- [x] --- [x] -------------------
  A/foo     [x]     [x]     [x]                 \
           /                   \ r7: some        \ r8:
  O     [ ] r4:     r5:     r6: \ set of          \sync
  O/foo [ ] cp+mod  mod     mod  \ subtree         \merge
           \                      \ merges          \
  B         [x] ------------------ [x] ------------ [x]
  B/foo     [x]                    [x]              [x]
  mergeinfo ()                     (/A:5            (/A:4-6)
                                    /A/foo:6)       # foo inherits

The definition of a 'corresponding subtree' is the set of node-revisions that 
have the same, constant, relative path below the merge source root and the 
merge target root.

While the two lines of history linking the root node of the merge source to 
that of the merge target (via their YCA) will follow any copies (and thus any 
renames) in their histories, the subtree is considered to be whatever node 
exists at the given relative path under that root.  If there are revisions at 
which such a subtree node does not exist at the given relative path, then those 
revisions are not considered eligible for merging.

  Example:
    now with an initial subtree named 'foo', deleted in r5 in
    branch A, and replaced in r6 by a new 'foo' copied from 'bar'
    (where 'bar' is assumed to exist somewhere but not shown here):

            r4      r5      r6     r7               r8

  A         [x] --- [x] --- [x] -- [x] ------------
  A/foo    /[x] --- [D]         __ [A] ------------
          / /                  /                 \\ r8:
  O     [ ]/            .../bar                   \\ sync
  O/foo [ ]\                                       \\ merge
          \ \                                       \\
  B        \[x] ----------------------------------- [x]
  B/foo     [x] ----------------------------------- [x]

In the r8 merge here, the following changes are merged to the target's subtree 
'foo':

  r4 A/foo  -- mod
  r5 A/foo  -- delete
  r6        -- no, not eligible for merge
  r7 A/foo  -- add

I'm not yet clear on the details -- such as whether this would be merged as a 
single diff between A/foo@3 and A/foo@7 (ignoring ancestry) or would be broken 
into multiple revision ranges (because of the gap at r6, and/or because of 
A/foo@3 being unrelated to A/foo@7).


INCONSISTENCY

If we look at the diagram above and, instead of the merge shown for r8, imagine 
we run

  svn merge ^/A/foo wc-B/foo

then it should be clear that the merge will (attempt to) find all changes along 
the copy-history of A/foo@7, back to its youngest common ancestor with B/foo@7, 
and consider those as the revisions/changes eligible for merging.

But if we run the merge shown in the diagram,

  svn merge ^/A wc-B

then the merge will not consider changes along the history of A/foo before r7, 
where that "same node" was named A/bar.

This difference seems undesirable.  It's not that the current behaviour is 
necessarily wrong, it is just what it is, but let's see if we can improve it.


PAIRING SUBTREES BY THEIR OWN ANCESTRY

For the future, it may be better to trace each subtree in one branch of the 
merge to a corresponding subtree in the other branch, with the correspondence 
determined by following renames.

This would enable the following kinds of history to be followed:

* Subtree 'foo' has been renamed in one or both branches:

  A       [A] ------ [ ] ---
  A/foo   [A]        [x]    \
          /                  \ merge:
  O     [ ]                   \ matches
  O/foo [ ]                    \ A/foo to B/bar
          \      mv foo bar     \
  B       [A] --- [ ]--------- [ ]
  B/foo   [A]     [D]
  B/bar           [A] --------- [x]


* Subtree 'foo' was created after branching, so its youngest common ancestor is 
later than the merge-root's YCA:

  A       [A] --- [-] --------- [x]
  A/foo           [A] --------- [x]
          /          \             \ merge:
  O     [ ]           \ merge:      \ matches
                       \ adds foo    \ A/foo to B/foo
          \             \             \
  B       [A] --- [D] --- [-] ------ [x]
  B/foo                   [A]        [x]


* Subtree 'foo' did not exist at the YCA but was created initially in some 
other branch and then copied into both A and B.  The YCA of the subtrees A/foo 
and B/foo is not itself a subtree of O, nor of A, nor of B, but simply some 
other location in the repository.

Notice that all of the interesting cases (where behaviour would differ from the 
current system) have to do with renames.

The location history of the merge *root* node can be uniquely determined by 
tracing only backwards, because the source root node and target root node are 
specified in advance.  In contrast, subtrees of the source and target root 
nodes cannot be uniquely paired using copy history alone because (in general) 
more than one copy may exist of any node.  Complete pairing requires us to have 
a way to follow *renames* (as distinct from merely copies), whether 
automatically or with manual input.

Therefore it seems we need to design this model of subtree handling on top of 
an assumption that information about renames will be available.


HOW SHOULD SYMMETRIC MERGE BEHAVE?

What does this mean for how the initial implementation of symmetric merge 
should behave?

My intention so far has been to make the 'sync-like' direction of symmetric 
merge (that is, same direction as previous merge) behave as a 
backward-compatible replacement for the current 'sync' (that is, 
non-reintegrate, merge-tracking, unspecified minimum revision) merge.

However, I don't want to fence us in to a new behaviour with the same 
backward-compatibility guarantee with regard to subtrees.  So now I'm thinking 
maybe a plain symmetric merge should error out if there are subtrees (that have 
different eligible revision ranges from the root).  What would the command-line 
UI look like?  A plain "merge" errors if subtree merges are needed, while an 
option forces it to handle them (in the current fixed-relpath way)?  And then 
in the future we remove the restriction if we come up with a good way to handle 
them?



That's enough for this email.

Please let me know your thoughts.

- Julian

Subtree mergeinfo -- what I learnt at the Hackathon

Reply via email to