Re: Problems Reintegrating the fsfs-format7 branch

Paul Burba Thu, 21 Mar 2013 08:25:49 -0700

On Fri, Mar 8, 2013 at 4:17 PM, Paul Burba <ptbu...@gmail.com> wrote:
> On Tue, Mar 5, 2013 at 11:37 AM, Julian Foad <julianf...@btopenworld.com> 
> wrote:
>> Paul Burba wrote:
>>> On Wed, Feb 27, 2013 at 5:04 PM, Julian Foad wrote:
>>>>  Paul Burba wrote:
>>>>>  I found the cause of the conflict filled reintegrate merge.  The
>>>>>  automatic merge code seems to be doing the right thing re Mark's
>>>>>  automatic merge above, the problem arises earlier.
>>>>
>>>>  Paul, thanks for investigating.  While I'm glad to hear the automatic
>>>> merge was not the main root cause of the problem, it would make sense
>>>> for it to detect that merges in the opposite direction have been done
>>>> and at least warn the user.
>>>
>>> Hi Julian,
>>>
>>> What are we going to warn them to do exactly?
>>
>> What I was thinking of here is to detect the most general condition that 
>> isn't handled properly.  Let me explain this part, although I'm sure you're 
>> familiar with it.
>>
>> Whenever the direction of merge is from A to B and the automatic merge code 
>> doesn't detect that a reintegrate merge is required, and yet one or more 
>> cherry-picked revision ranges and/or subtree merges have been merged from B 
>> to A, Subversion currently doesn't notice those merges from B to A, and 
>> proceeds to merge "everything" from A to B including those changes that were 
>> the result of B-to-A merges.  That is what the plain "merge" from A to B has 
>> always done -- it has always ignored any mergeinfo describing merges from B 
>> to A.
>
> Yes, http://subversion.tigris.org/issues/show_bug.cgi?id=4255 for the curious.
>
>> In the example at hand, there has been a B-to-A merge of everything from B 
>> (except some non-operative revisions), and so we get loads of conflicts by 
>> ignoring this when we merge the other way.  The intelligent user thinks, 
>> "This is clearly a reintegrate scenario."
>
> Do they?  I'm not very confident that most users will immediately
> think that, especially since we've deprecated the --reintegrate
> option.  Even if they do think "I need a reintegrate style merge!" I
> suspect they are more likely to do just that, explicitly using the
> --reintegrate option, rather than re-syncing B->A and then repeating
> the automatic "reintegrate style" merge.
>
>> On the other hand, if there has been a B-to-A merge of not quite everything 
>> from B (leaving an operative unmerged part at the beginning or somewhere in 
>> the middle), then an old "reintegrate" merge from A to B would error out, 
>> whereas an old non-reintegrate merge would just do the merge and wrongly 
>> re-merge all the B-to-A changes back to B.  The new automatic merge, if it 
>> determines (based on the merging of the branch root node) to do a 
>> non-reintegrate, currently does the latter.
>>
>> At some point, maybe not for 1.8, if we can detect any such condition, it 
>> would be a good to warn the user:
>>
>>   A clean merge from A to B is not possible because some but not all
>>   changes have been merged from B to A.  You're going to see conflicts
>>   (unless all your potential conflicts are trivially auto-resolved).
>
> But a reintegrate style merge already does this, if the automatic
> merge could only determine up front that that is the correct type of
> merge.
>
>> We could add some suggestions to the warning:
>>
>>   If you are expecting this to be a reintegrate-like merge, check that
>>   all changes on B including non-operative revision ranges have been
>>   merged to A and recorded in the mergeinfo; try an automatic sync merge
>>   from B to A to fill in any gaps.
>>
>>   If you are expecting a sync merge, note that there have been some
>>   cherry-pick and/or subtree merges from B to A, and this merge will
>>   attempt to merge these same changes back to B, which will likely
>>   produce conflicts.
>>
>> We could add more specific details of what's been found, and more specific 
>> suggestions related to the details, such as the case at hand where all 
>> operative revs have been merged B-to-A, if we think it's worth the effort.
>>
>> A basic implementation would not be cheap if there is a significant amout of 
>> subtree mergeinfo to check.  And I have carefully avoided the issue of 
>> whether we should error out or proceed with the merge, as there's no point 
>> bike-shedding about that unless we really decide to do this.
>>
>> So let's take a closer look at what we could do for the specific case at 
>> hand.
>>
>>
>>>  To explicitly use
>>> --reintegrate option?  Because that would work in Stefan's case, so we
>>> should just do it rather than warn...unless you mean something else.
>>
>> So, you're thinking more about detecting the very specific kind of scenario 
>> that Stefan encountered -- where everything except some leading 
>> non-operative revisions had been merged B-to-A, and a reintegrate would give 
>> the "correct" result.
>
> Why is "correct" in quotes?  How is a reintegrate style merge ever the
> wrong choice in that scenario?
>
> Actually, don't answer that, because I'm thinking of a more general
> case: The missing non-operative revisions may be leading, as in
> Stefan's case, or they may be interspersed with operative revisions
> that have already been merged (including subtree merges where the
> change is only operative in the given subtree -- i.e. would be a no-op
> even if repeated at the branch root).  Right now the automatic merge
> code (i.e. merge.c:find_last_merged_location()) obviously doesn't
> handle this, but we already have code that can answer this question:
> svn_client_mergeinfo_log2[1].
>
> Here's an example from our test suite of what I mean:
>
> Run merge_reintegrate_tests.py 13 then revert the local changes the
> test leaves behind.  We want to merge A_COPY to A, but there have been
> prior cherrypicks and subtree merges from A to A_COPY:
>
>>svn pl -vR A_COPY
> Properties on 'A_COPY':
>   svn:mergeinfo
>     /A:6
> Properties on 'A_COPY\B':
>   svn:mergeinfo
>     /A/B:5
> Properties on 'A_COPY\D\G\rho':
>   svn:mergeinfo
>     /A/D/G/rho:4
> Properties on 'A_COPY\D\H':
>   svn:mergeinfo
>     /A/D/H:3,6
>
> However, we can see that these prior cherrypicks and subtree merges
> actually constitute all operative revisions up to r8:
>
>>svn mergeinfo --show-revs eligible -R ^^/A A_COPY
> r9
>
> So an explicit reintegrate merge works fine:
>
>>svn merge ^^/A_COPY A --reintegrate
> --- Merging differences between repository URLs into 'A':
> U    A\mu
> --- Recording mergeinfo for merge between repository URLs into 'A':
>  U   A
>
> But an automatic merge doesn't notice this, wrongly chooses a sync
> style merge and results in spurious changes/conflicts:
>
>>svn revert -Rq .
>
>>svn merge ^^/A_COPY A
> --- Merging r2 through r9 into 'A':
> U    A\mu
> C    A\D\H\psi
> --- Recording mergeinfo for merge of r2 through r9 into 'A':
>  U   A
> Summary of conflicts:
>   Text conflicts: 1
> Conflict discovered in file 'A\D\H\psi'.
> Select: (p) postpone, (df) diff-full, (e) edit, (m) merge,
>         (mc) mine-conflict, (tc) theirs-conflict, (s) show all options: p
>
> So my point is simply that if an explicit --reintegrate merge can
> DTRT, why can't an automatic merge?
>
> [1] calculate_left_hand_side also does effectively the same thing --
> which makes me wonder if we can't use svn_client_mergeinfo_log2 there,
> but that's another issue.
>
>> In an immediate sense, certainly we could "just do it", special-case this 
>> particular scenario (have you thought about what the exact condition should 
>> be?) and automatically choose the reintegrate code path in that case.
>
> See above "I'm thinking of a more general case".
>
>> The trouble is, that doesn't resolve the situation completely.  It gives the 
>> correct result in terms of the changes merged, but it doesn't straighten out 
>> the mergeinfo.
>
> You mean the mergeinfo on the source?
>
>> That means we have to make sure the rest of the merge subsystem consistently 
>> handles this case too: the "sync"-style merge, merge graphing tools that 
>> determine what kind of merge was done and what kinds of merge are eligible, 
>> and so on.  See below.
>
> I'm not sure how this is a problem.  If we do a reintegrate style
> merge in the general situation I describe above, how does that break
> subsequent sync style merges?  Can you point to an example?  in terms
> of determining eligible revisions, 'svn mergeinfo' already handles
> these gaps.  Which specific graphing tools are you speaking of?
>
>>> I filed an issue for this:
>>> http://subversion.tigris.org/issues/show_bug.cgi?id=4329 and added a
>>> test which demonstrates a very simple version of the problem.  I set
>>> issue #4329 a 1.8 blocker.  While I think what Stefan is doing is
>>> somewhat atypical[1], if we are deprecating --reintegrate, then
>>> automatic merges should handle cases like this.  After all, if we
>>> explicitly use the --reintegrate option the merge works fine.  What do
>>> others think?
>>
>> I agree it's atypical (or rather unintended) usage, and I also acknowledge 
>> that if Stefan used merge like this, then sometimes other people will do too.
>>
>> However, I don't want us to put any more special cases in the code, 
>> especially without putting them into the *design*.  If the old 
>> "--reintegrate" accepts a missing leading no-op revision range in 
>> considering whether "everything" has been merged, that should now be 
>> considered a bug.
>
> Seriously?  As long as we allow cherrypicks and subtree merges I think
> we want to support use cases like this...to call it a bug that we
> DTRT, I guess I just don't get it.
>
>> The new rule is that if every revision on B is *recorded* as merged to A, 
>> then we can reintegrate from A to B, else we can't.
>
> I never came away with the understanding in all the discussion of
> automatic merges...but maybe I'm the only one.
>
>> The main alternative would be to adopt some principle such as "When 
>> determining what's been merged and what needs to be merged, we only notice 
>> operative revisions, aka non-null changes".  I considered that during the 
>> design of automatic merge, and was strongly drawn to the idea and started 
>> prototyping around it, but that turned out to be a rat-hole involving 
>> potentially exhaustive traversal of the whole mergeinfo graph (all branches) 
>> to determine whether a given bit of mergeinfo represented an operative or an 
>> inoperative change.  And the benefits are near zero, because both sync and 
>> reintegrate merges were already filling in any gaps in the mergeinfo in the 
>> direction of the merge so the situation can only arise if you use manual 
>> merges.
>>
>> The sensible choice in the end was to require that revision ranges are 
>> explicitly recorded whether operative or not.
>>
>> Having found a case that we suspect may occur frequently enough to cause 
>> annoyance, I believe the best thing to do is to admit that we don't handle 
>> it automatically.  Just because the old reintegrate handles this particular 
>> case doesn't mean we're making a mistake if the new automatic merge doesn't. 
>>  This behaviour change is a small adjustment towards overall more consistent 
>> merge behaviour, and consistency is key to large-scale usability, much more 
>> so than special casing is.
>>
>> So, what to do exactly?  Options seem to be:
>>
>>   0) leave it as it is
>>
>>   1) detect this specific case and warn or error out
>>
>>   2) detect this specific case and do a reintegrate
>
> The attached patch accomplishes, let's call it "2.5", because it
> handles the more general case I outlined above, but not the case
> represented by issue #4255.  Would you mind taking a look?


fyi http://subversion.tigris.org/issues/show_bug.cgi?id=4329#desc7
explains my proposed patch in terms of how it changes the symmetric
merge algorithm.

-- 
Paul T. Burba
CollabNet, Inc. -- www.collab.net -- Enterprise Cloud Development
Skype: ptburba

Re: Problems Reintegrating the fsfs-format7 branch

Reply via email to