I have written out how I think a large part of the symmetric merge algorithm should go, in more detail. Review and other feedback would be welcome.
We need to write out the algorithm in this level of detail and a little more detail, and then write several new functions for implementing parts of it. I had until recently been thinking that I would be able to re-work most of the existing functions to accomodate what we need, and I have made some improvements along the way, but that is proving too difficult. (I started making some notes in <http://wiki.apache.org/subversion/MergeCodeImprovements> about things that I think we should be looking to change in the existing merge code.) At this point I haven't included processing of subtree mergeinfo in the algorithm description, though I have that in mind too. I hope your mail viewers preserve indentation and line breaks, as I've used indentation (space characters only) for formatting. Symmetric Merge Algorithm 1. Find the latest rev of A synced to B and of B synced to A. 2. Choose a base. 3. Identify cherry-picks. 4. Break into 3-way merges, skipping the cherry-picks. 5. Perform the 3-way merges and mergeinfo addition. In more detail. 1. Find the latest revision of A synced to B and the latest revision of B synced to A. This means, for the A to B direction, the youngest revision 'rN' on A at which all revisions of A, up to and including rN, are now recorded as having been merged into B. And "now recorded" means recorded in the version of B that is being used as the merge target. Usually the result corresponds to the most recent merge into B from A, but not necessarily, because later revisions from A might previously have been cherry-picked into B. We consider only direct A <-> B mergeinfo. (Consideration of a merge graph extending to other branches is future work. Since we record mergeinfo transitively, considering only direct A<->B mergeinfo already suffices for some 3-branch scenarios too, but not all such scenarios.) In: A_tip: pathrev B_wc: wc-abspath (Note: B is a WC "actual" tree including any local mods.) Out: A_base: pathrev B_base: pathrev (Note: B_base can't possibly be B if B has local mods.) Implementation: Currently: find_base_on_source() find_base_on_target() TODO: fill in the details here and re-visit the implementation. 2. Choose a base. Choose the latest revision of A synced to B or the latest revision of B synced to A. Each candidate base is a common ancestor of the source and target, if ancestry means following either the direct line of descent or the arrows that indicate complete merges. Choose the 'more recent' one in some sense -- be it time of commit, or number of changes since then, or some other measure. [TODO] In what cases do the selection methods give different results? Only after a criss-cross merge? In: A_base: pathrev B_base: pathrev Out: base: pathrev Implementation: Currently, in find_symmetric_merge(): (A_base->rev > B_base->rev) ? A_base : B_base 3. Identify cherry-picks. Find changes along the source side of the merge that are already 'in' the target. Look for merges in both directions (from current source to current target ("forward") and from current target to current source ("backward")). For each merge: Break down the merge into its component logical changes. If the merge consists entirely of logical changes that are not in the target, or consists entirely of logical changes that are in the target, treat it as a unit -- a cherry pick. Otherwise, fall back to the user: report to the user which logical changes that merge includes, and suggest that the user run more specific merges that pull only the subset of logical changes they want. 3a. Create an (implicit or explicit) list of indivisible changes along each side of the merge, since the BASE. For example: Source side Target side B@10:A@15 B@10:B@11 (B@10 is BASE) A@15:A@20 B@11:B@12 ... ... A@26:A@27 B@29:B@30 B@30:B_wc (B@30 is B_wc's origin) In: base A_tip B_wc Out: A_changes: a list of (URL1@REV1 : URL2@REV2) pairs? Or, more compactly, BASE plus a list of REVs that index into the branch history? B_changes: the same Note: The first change in one of the lists might be a cross-branch change (from 'BASE' on the target branch to 'MID' on the source branch), whereas all subsequent changes are changes along a branch. Note: The B_changes output needs to be able to represent B_wc as its end point, implicitly or explicitly. Note: These lists need to reference changes in such a way that we can fetch and examine the changes, at least in terms of their mergeinfo and whether they're operative. TODO: Should the outputs identify each individual change (revision) as operative or inoperative, or is it acceptable to output revisions (or ranges of revs) that are not all known to be operative? 3b. Discover the changes from A that have been merged to B since BASE. In: A_changes B_changes Out: A_changes_to_skip: a subset of A_changes A_changes_to_warn: a subset of A_changes Implementation phase 1: This detects direct cherry-picks in the same direction as the present merge, which is all that Subversion 1.7 supports. Fetch mergeinfo on B_wc; note any changes from A that are recorded as merged to B (since BASE, and directly) as "already on B". Implementation phase 2: This adds detection of cherry-picks in the opposite direction, which is new functionality. For each change in A_changes: Fetch the mergeinfo change. If the mergeinfo did change (it represents a merge into A): If that merge was a direct merge from B, and from nowhere else, add this A change onto A_changes_to_skip. If that is, instead, a merge from both B and other sources, then add this A change onto A_changes_to_warn. Add this to the phase 1 results for A->B. Note: If the first change is a BASE:MID cross-branch change, it must first be turned into the equivalent series of source-side changes, which is a non-trivial exercise. Implementation phase 3: This detects both direct and indirect (that is, via a third branch) cherry-pick merges, in both directions. This is more complex, and a simple implementation of it may run slowly in some scenarios. For each change in A_changes: Fetch the mergeinfo change. If the mergeinfo did change (it represents a merge into A), break down that merge into its component logical changes, else take this change on A as the logical change to test. See if those logical changes are already on B. If all of those logical changes are on B, add this A change onto A_changes_to_skip; if some of them are on B, add this A change onto A_changes_to_warn. 4. Break into 3-way merges, skipping the cherry-picks. Implementation: # Define a merge as (base, tip, target). merges = [(BASE, A_tip, B_wc)] for change in A_changes_to_skip: m = find change.rev in merges replace [m] with [ (m.base, change-1, m.target), (change, m.tip, m.target) ] # elide either or both of those if they are no-ops 5. Perform the 3-way merges and mergeinfo addition. Implementation: for m in merges: three_way_merge(m.base, m.tip, m.target) if m.base is on A: do a normal mergeinfo addition else: mergeinfo += (all source revs from YCA(?) up to m.tip) - Julian