Philip Martin wrote:
>> On 30.08.2013 17:18, Julian Foad wrote:
>>> The client *does* need to know that B is in fact being moved to C, so 
>>> that it can offer to transfer my local modifications from B to C.
> 
> Allowing multiple moves to the same destination doesn't really fit with
> Ev2 but I can imagine an enhanced Ev1 editor drive that does
> 
>    move away A
>    move to C (original A)
>    del A
>    move away B
>    move to C (original B)
>    del B
>    add C
> 
> The deletes lead to tree-conflicts on A and B due to local mods.  The
> add creates a pristine C with no local mods.  The user resolves the
> tree-conflicts post-update and gets to choose which local mods are
> merged to C, possibly both but one before the other, which may in turn
> raise text/prop/tree-conflicts on C.

This is good as far as local mods are concerned.  The decision of how to 
combine A and B, or raise a conflict, is left to the WC client code as it 
should be, and sufficient information is provided about the nature of the 
changes.

Functionally, this would work as you've shown.  Regarding updating the base 
tree, however, note that the plain "add" in Ev1 is a node-by-node add of the 
whole subtree, which could be arbitrarily large.  For performance, we must not 
use a plain add, as a move should be designed to execute in O(1) time.[1]

So, one of the base trees (at A or B) should be deleted, while the other should 
be moved over to 'C' and updated.  The edit producer must therefore make a 
decision between A and B, and send either

  [move-info...]
  del A   // but WC shall not delete the base tree
  del B
  add C (copy-from A@10)  // WC shall move the base tree from A
  modify... C/...  // diff A@10:C@30

or

  [move-info...]
  del A
  del B   // but WC shall not delete the base tree
  add C (copy-from B@20)  // WC shall move the base tree from B
  modify... C/...  // diff B@20:C@30

(The ideal choice of which base-tree to move would be the one with the smaller 
delta to C@30, but that is hard to determine efficiently.  The one with the 
closer revision number, in this case B, would often be a good choice.  However, 
each of A and B could be a mixed-rev subtree.  A good choosing algorithm is far 
from obvious.  That this problem exists at all might be a clue that something 
deeper is wrong.)

But notice how we're really trying to send a combination of two different 
messages:

  * The incoming logical change, that must be merged with any local mods, is:

    - mv(A,C) && modify-tree(diff(A10:C30))
    - mv(B,C) && modify-tree(diff(B20:C30))

  * To produce the new base tree, one reasonably efficient way is:

    - rm(A)
    - mv(B,C)
    - modify-tree(C, diff(B20:C30))

What we've been striving for above is a kind of hybrid of those two messages.

But let's try to take a step back and look at how we got into this surprisingly 
ugly case in the first place.  The real problem stems from an "impedance 
mismatch" between the path-centric semantics of the existing WC and the 
line-of-history semantics that "moves" introduces.

Why do we start off with the expectation that this update (A@10 and B@20 -> 
C@30) should resolve smoothly into C@30?  Because, in our old-fashioned 
thinking, this was a "normal" mixed-revision WC, and any "normal" WC should 
update smoothly.  But what if we instead started off with this WC (and the same 
repository):

  old_B->A@10
  B@20

Then we would say that 'old_B' is 'switched' and we would not expect an update 
to bring this WC to contain just a child 'C'.  We might expect the child named 
'old_B' to be deleted (if we're still clinging to the old copy-and-delete ideas 
about how a move should be handled), or in a move-aware world we might expect 
'old_B' to remain present but end up pointing at '^/C@30'.

So, back to the original example, I'm thinking that the child 'A@10' should be 
treated as a switched child.  In a move-aware world, that child should not be 
found in the WC at all at the same time as 'B@20' unless somebody did a 
deliberate kind of move-defeating update.

But the WC doesn't know 'A@10' and 'B@20' are the same repository node.  The WC 
doesn't know about node-ids or node-history relationships.  This is the 
problem.  Maybe the WC needs to know.

If the WC doesn't know about node-id relationships, then it cannot prevent the 
following silly situations from arising:

(1) Start from:

  A@10
  B@20

  Then move A to B/A, and try to commit.  The commit will fail because 
you cannot move the same node into a child of itself.  The detection 
occurs only at commit time, where the system knows about node ids, 
rather than at the time the invalid operation was performed.
(2) Start from:

  A@10
  B@20

  Then modify A, and modify B, and try to commit.  The commit will fail 
because the two changes to the same repository node conflict.  The WC 
contains logic to detect this kind of problem before commit in the 
specific case where a switched WC node is pointing to the same URL as 
another node, but it does not know how to detect when two WC paths 
pointing to *different* URLs are in fact pointing to the same repository node.


I think the answer is that in order to make a properly move-aware system, the 
WC needs to know about node-id relationships [2].

It may be advantageous to come up with an implementation plan whereby 
move-awareness is introduced first to the repository side, with just enough 
support in the WC to make typical move scenarios work much better than they do 
today, and then as a second phase extend the WC to know about node-id 
relationships and complete the WC-side awareness (so that scenarios like the 
above commit-time failures are instead detected within the WC, and whatever 
else goes along with that).

- Julian


[1] The current WC DB schema requires O(nodes in subtree) time for a 
move internally, but that's an implementation detail and is often not 
the dominant factor.

[2] It could either know the actual node-copy-ids, or just know which pairs of 
nodes have the same node-copy-id; either would suffice.

Reply via email to