Re: It's time to fix Subversion Merge

Andy Singleton Mon, 11 Jul 2011 10:58:39 -0700

I received a lot of good comments, and I will batch up my responses inthis note.

From Stefan, essentially "Can you improve the existing merge"? Yes, Ithink that we can start with the existing merge code.

However, I also think that any implementation that uses subtreemerginfo, and does not have extensible merginfo, is doomed. Too mucheffort goes into fixing up the subtree merge feature, and it makes thetree change problems insoluble. So, we need to decisively cut off thesubtree options and move to a bigger and more extensible datastructure. That's why I proposed adding a new command, "newmerge". Theexisting code won't be destabilized.

Paul notes that we need test cases. Yes, exactly. The first step inthis project is to make some test cases, and see how they perform withthe existing merge, and describe what users report as the problem withthese cases. This will settle the debate about whether the existingmerge is good enough. We can classify an alternate merge implementationaccording to how many additional cases it handles correctly. I think atest cases is more than a patch. It is a series of commit and mergeoperations.

Mark and C. Micheal Plato raise the most serious issue. Subversionmerge problems come from the core architecture and have persisted overmany years. A complete fix may require a more radical change. And, itis possible that SVN needs a bigger redesign even to meet the goals Iput out today. You have more experience with that than I do. We willsee. At this point, I think that merge can be significantly improvedfor the existing server architecture.

Yes, the "cyclic merge" problem is a big one, and along with the treechange problem, it accounts for most of the frustrating behavior ofSubversion merge - http://subversion.tigris.org/issues/show_bug.cgi?id=2837

I believe that cyclic merges can be handled with a bigger merge_history/ merginfo file. When you do a merge, you make some edits to resolveproblems. Then, you commit the changes - all of the merged changesets,plus the edits. You also write the instructions for resolving thismerge into the merge_history / merginfo file. The next time you go todo a merge, you can replay any of the changes that you need. The newmerge_history will be a big file with a complete history.

This won't be a simple implementation, but the inside of a merge isnever simple. We need to add intelligence to the merge so that it lookssimple to the user. This intelligence can be incrementally improvedthrough test cases and the open source process.

New architecture might be required for handling moved and renamedpaths. This is a problem that comes up frequently in merges. However,it also comes up in normal updates. From a merge point of view, movedfiles should actually move and drag their changes with them, rather thanappear as new files with copy+delete.* After we map to new files (manually, or with an algorithm) in anupdate or a merge, we should remember the change in the merge_history.That's why we make the history extensible.* To automate this process, I think that moved files should beidentified by filename and tree structure, not by file ID. Yes, this isa change in the way that Subversion thinks, but it is clearly a problemthat needs to be fixed. Other SCM systems like git use an algorithmthat makes a best guess on tree matches. As noted by Greg, git doesn'tdo any other type of move tracking, and git merge works well.

The work noted by Stefan on truMerge is a good example of thisstrategy. We can do the same thing - http://trumerge.open.collab.net/ .I completely agree with the major points in this implementation:

1) It uses "heuristics" to map trees together

2) "All merges are done at the root of the branch" and "All merges arecomplete (no merges in sparse working copies, etc.)"

You can see that getting rid of the subtree merges is a necessary andprobably sufficient step for fixing the tree change problems.

Mark asks where we get the GUID/UUID for foreign merges. It alreadyexists, because we have a server UUID, as Daniel wrote:<repository_UUID-revision_number>. We just need to keep track of it.

In systems like git, if the user wants to cherrypick, the user mustenter the complete GUID/UUID. However, it is probably not relevant forSubversion. You can only cherrypick complete commits from the source,not from other sources. So, you can leave out the UUID and just specifythe revision number. You can get complete merge commits with thistechnique. Unfortunately, you are not guaranteed to have access toindividual commits that were inside the merge. Because of this,changesets inside merge commits will be vulnerable to "conflation", youwill have to sort through cases where you already have some but not allof the changes that were in a merge commit you are merging, and youwon't be able to cherrypick inside the merge commit. I need to thinkmore about this case, and whether we should track individual commitsthat were merged. That could be an extension.



On 7/11/2011 12:51 PM, C. Michael Pilato wrote:

On 07/11/2011 11:46 AM, Andy Singleton wrote:

  Many developers are moving from Subversion to other SCM systems that have
better merge capabilities. I have posted an article with a proposal to fix
this problem, here:

http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx

[...]

I think that we can build a newmerge prototype by stripping down the
existing merge to remove the subtree options, and moving to the extensible
merginfo format. It will be useful to get advice about this from experienced
team members.

Your optimism is lovely (and welcome, even!), but I am not as convinced as
you that the reason why Subversion's merge functionality is subpar is as
superficial as the items you call out (and which are implied by your
prototyping plan above).

Very little (if anything) about your proposal touches on the *real*
problems, such as Subversion's handling of moved/renamed objects, tree
conflict detection/handling/resolution, changeset conflation caused by the
fundamental diff+patch approach Subversion takes to merges rather than
first-class changeset support), etc.  These real problems with merging were
documented many years before the merge tracking feature was ever conceived,
and neither that feature nor its skin-deep-only warts you aim to address
made a dent in solving those very real problems.

I don't aim to discourage -- far from it!  On the contrary, I want to
encourage a deeper review of the situation.  It's entirely possible that, in
doing so, you will find solutions for the deeper core problems here, and
obviously the Subversion community (devs and users alike) would love that!

-- C-Mike

[1] I'll grant that in your blog post, you at least acknowledge the tree
changes problem and place great stock in your extensible merge tracking
format toward some future solution.



--
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: It's time to fix Subversion Merge

Reply via email to