I received a lot of good comments, and I will batch up my responses in this note.

From Stefan, essentially "Can you improve the existing merge"? Yes, I think that we can start with the existing merge code.

However, I also think that any implementation that uses subtree merginfo, and does not have extensible merginfo, is doomed. Too much effort goes into fixing up the subtree merge feature, and it makes the tree change problems insoluble. So, we need to decisively cut off the subtree options and move to a bigger and more extensible data structure. That's why I proposed adding a new command, "newmerge". The existing code won't be destabilized.

Paul notes that we need test cases. Yes, exactly. The first step in this project is to make some test cases, and see how they perform with the existing merge, and describe what users report as the problem with these cases. This will settle the debate about whether the existing merge is good enough. We can classify an alternate merge implementation according to how many additional cases it handles correctly. I think a test cases is more than a patch. It is a series of commit and merge operations.

Mark and C. Micheal Plato raise the most serious issue. Subversion merge problems come from the core architecture and have persisted over many years. A complete fix may require a more radical change. And, it is possible that SVN needs a bigger redesign even to meet the goals I put out today. You have more experience with that than I do. We will see. At this point, I think that merge can be significantly improved for the existing server architecture.

Yes, the "cyclic merge" problem is a big one, and along with the tree change problem, it accounts for most of the frustrating behavior of Subversion merge - http://subversion.tigris.org/issues/show_bug.cgi?id=2837

I believe that cyclic merges can be handled with a bigger merge_history / merginfo file. When you do a merge, you make some edits to resolve problems. Then, you commit the changes - all of the merged changesets, plus the edits. You also write the instructions for resolving this merge into the merge_history / merginfo file. The next time you go to do a merge, you can replay any of the changes that you need. The new merge_history will be a big file with a complete history.

This won't be a simple implementation, but the inside of a merge is never simple. We need to add intelligence to the merge so that it looks simple to the user. This intelligence can be incrementally improved through test cases and the open source process.

New architecture might be required for handling moved and renamed paths. This is a problem that comes up frequently in merges. However, it also comes up in normal updates. From a merge point of view, moved files should actually move and drag their changes with them, rather than appear as new files with copy+delete. * After we map to new files (manually, or with an algorithm) in an update or a merge, we should remember the change in the merge_history. That's why we make the history extensible. * To automate this process, I think that moved files should be identified by filename and tree structure, not by file ID. Yes, this is a change in the way that Subversion thinks, but it is clearly a problem that needs to be fixed. Other SCM systems like git use an algorithm that makes a best guess on tree matches. As noted by Greg, git doesn't do any other type of move tracking, and git merge works well.

The work noted by Stefan on truMerge is a good example of this strategy. We can do the same thing - http://trumerge.open.collab.net/ . I completely agree with the major points in this implementation:
1) It uses "heuristics" to map trees together
2) "All merges are done at the root of the branch" and "All merges are complete (no merges in sparse working copies, etc.)"

You can see that getting rid of the subtree merges is a necessary and probably sufficient step for fixing the tree change problems.

Mark asks where we get the GUID/UUID for foreign merges. It already exists, because we have a server UUID, as Daniel wrote: <repository_UUID-revision_number>. We just need to keep track of it.

In systems like git, if the user wants to cherrypick, the user must enter the complete GUID/UUID. However, it is probably not relevant for Subversion. You can only cherrypick complete commits from the source, not from other sources. So, you can leave out the UUID and just specify the revision number. You can get complete merge commits with this technique. Unfortunately, you are not guaranteed to have access to individual commits that were inside the merge. Because of this, changesets inside merge commits will be vulnerable to "conflation", you will have to sort through cases where you already have some but not all of the changes that were in a merge commit you are merging, and you won't be able to cherrypick inside the merge commit. I need to think more about this case, and whether we should track individual commits that were merged. That could be an extension.


On 7/11/2011 12:51 PM, C. Michael Pilato wrote:
On 07/11/2011 11:46 AM, Andy Singleton wrote:
  Many developers are moving from Subversion to other SCM systems that have
better merge capabilities. I have posted an article with a proposal to fix
this problem, here:

http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx
[...]

I think that we can build a newmerge prototype by stripping down the
existing merge to remove the subtree options, and moving to the extensible
merginfo format. It will be useful to get advice about this from experienced
team members.
Your optimism is lovely (and welcome, even!), but I am not as convinced as
you that the reason why Subversion's merge functionality is subpar is as
superficial as the items you call out (and which are implied by your
prototyping plan above).

Very little (if anything) about your proposal touches on the *real*
problems, such as Subversion's handling of moved/renamed objects, tree
conflict detection/handling/resolution, changeset conflation caused by the
fundamental diff+patch approach Subversion takes to merges rather than
first-class changeset support), etc.  These real problems with merging were
documented many years before the merge tracking feature was ever conceived,
and neither that feature nor its skin-deep-only warts you aim to address
made a dent in solving those very real problems.

I don't aim to discourage -- far from it!  On the contrary, I want to
encourage a deeper review of the situation.  It's entirely possible that, in
doing so, you will find solutions for the deeper core problems here, and
obviously the Subversion community (devs and users alike) would love that!

-- C-Mike

[1] I'll grant that in your blog post, you at least acknowledge the tree
changes problem and place great stock in your extensible merge tracking
format toward some future solution.



--
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Reply via email to