On 29 April, Stefan Fuhrmann wrote: > Julian Foad wrote: >> Stefan Fuhrmann wrote: >>> [...] The proposal I make here is not >>> an alternative to e.g. "Symmetric Merge" but rather a >>> refined merge strategy that would also benefit the current >>> merge. [...] >>> In end, we will always merge individual file or property >>> changes but we plan the merge on tree-level. For many >>> file nodes, this will unnecessary intermediate states >>> where the merge got fragmented due to a change to >>> some *other* file. >> >> Yes. That is likely to be a significant cause of extra conflict in cases >> where the user has sub-tree mergeinfo or has done some cherry-pick merges >> and is now doing a "merge everything else" merge. > > It also happens in the "back and forth" case:
If this problem does arise in basic back-and-forth merging then I would agree it's much more important than I thought, but I don't get it. If "it" refers to fragmenting the merge, then no, each merge in a back-and-forth case doesn't get fragmented. > if the changes in the source are interspersed > with merges from the target, i.e. the merges > from A to B will be excluded from the changes > to be merged from B to A. Do you mean like this? A2 A3 A4 A o-------o-------o-------o / \ / o \ / \ \ / B o---o-------o-------o B2 B3 B4 The merge we're considering is the one shown from B4 to A4. Changes B2 and B4 are changes in the source, interspersed with B3 which is a merge from the target, that is the result of a merge from A to B. A2 will be chosen as the base for a 3-way merge, and the two arms of this merge will be (A2:B4) as the source side and (A2:A4) as the target side. The source side, (A2:B4), is equivalent to the combination of B2 and B4. It can be a bit tricky to understand why it is equivalent; if so, see <>. The point is that changes B2 and B4 don't get split up and merged one after the other in this scenario. >>> This increases the probability of >>> merge conflicts and their likelihood to show up again >>> after each conflict resolution. >> (That is: after each phase of the N revision-range phases that the merge >> got broken into.) >> >>> Moreover, it masks cases >>> where Symmetric Merge might select an optimized >>> merge plan (see wiki for various examples). >> I understand, but I don't know of any examples on the Wiki yet. > > E.g. in a merge from A to B, we can simply take the merge > result of the last B to A merge if there were no later changes > on either side and the B -> A merge did not cherry-pick > (with respect to the node in question). >>> I propose to use the following two-phase merge workflow: >>> >>> (1) Planning phase >>> - read / evaluate the merge info and change history >>> to find the list of revisions to merge >>> - break that down for each node changed on the source side >>> - optionally handle renaming etc. >>> - mark tree conflicts (not sure when to signal them to the >>> user and when to resolve them; high-level fragmentation >>> might become necessary) >>> >>> (2) Execution phase >>> - merge changes on a per-node basis, i.e. all changes of >>> that node get merged before the next node >>> - fragmentation may still be necessary for *that file* >>> - given the "merge plan" for a node, alternative merge >>> strategies may be chosen >> >> OK, the significant change here is not so much the two-phase (although that >> has merit in itself) but rather turning the processing order inside-out from >> (conceptually) >> >> REV_RANGES = [contiguous-revision-ranges needed for TREE] >> for REV_RANGE in REV_RANGES: >> 3-way merge REV_RANGE into the whole TREE >> >> to (conceptually) >> >> for each NODE in the TREE: >> MERGES = [find the best sequence of 3-way merges to merge all needed >> changes into NODE] >> for MERGE in MERGES: >> 3-way-merge MERGE into NODE > > Correctly. With the tiny addition that all rename > and tree conflict handling is done in the planning > phase. That makes it repeatable and potentially > reduces the number of conflicts (at most one per > node instead of one per node and merge step). I get how it reduces conflicts. What do you mean by 'repeatable'? > The second phase is also reduced to purely text- > based processing because all the "structure" > issues have already been resolved. But that might > be more of a mental aid instead of a true reduction > in complexity (we still might need to handle missing > targets on disk etc.). >> This is a good idea in itself. It certainly brings advantages as >> mentioned above. I'm not sure what, if any, disadvantages it might have. > > Memory consumption could be proportional to > the number of changes and renames or such. > Should not be a real-world problem as long as > you don't merge more than a few 100k changes. > > A more severe issue might be latency because > the planning phase needs to gather / evaluate > data from large sections of history. But that one > can be improved ;) Right. >> I don't know that the scenarios this helps with are our main concern at >> this point. It would be good to keep this approach in mind when writing >> or designing any new merge code. In terms of development effort I think >> it's more valuable to concentrate first on the advantages to to-and-fro >> merging (most commonly without subtree merges and without cherry-picks) >> that the 'symmetric merge' brings. > > I think it can become helpful as soon as we track renames because > we need to remap on a per-node basis. To track a renamed node, we need to be able to recognize and act on the information that each node in a merge will be found at three paths that may all be different: its path in the merge-base (aka source-left) tree, its path in the source (aka source-right) tree, and its path in the target tree. Is that what you mean by "remap on a per-node basis"? I think that's something we will need to do anyway, but I don't see that it makes much difference whether we do this once at the planning stage (for a two-phase merge), or do it separately for each sub-merge (for the split-into-revision-ranges method). - Julian