Re: Design query: Ev2 and multiple moves

Bill Tutt Sat, 05 Nov 2011 21:18:18 -0700

I just know folks are going to hate me for mentioning this, but.....

Tom Lord came up with a way to store/apply arbitrary rename
applications against trees.
Apparently my google-fu failed me tonight because I couldn't find the
original reference/explanation. So bear with me as I recall the
details...

Reminder: What follows does not take into account how SVN works, and
is this written from the point of view of a generic version control
system that does support renames in all of their goriness.

Storage: You just store the renames as source/destination pairs as
appropriate for your system
Application of a series of renames:

For Each Source Part of the Rename (ordered by deepest first (DFS I think)):
   Move the source AND its contents to <tempID>/source

For Each Destination Part of the Rename (ordered by shallowest first
(BFS I think)):
   Move the item and its contents from <tempID>/source -> the
destination of the rename.

This has the advantage of never caring how many items are involved in
a dependent rename set or what order to apply them in because all
dependencies are broken.
The downside is that the process is expensive because it costs 2*(# of
renames)*(# descendants of source of rename). The descendants matter
if your system has an O(N) property for dealing with renames.

This allows for weird rename sequences like:

Assume you have a directory hierarchy:
A/B/C/D

Each of these has a file in it named: <dirname>.c
i.e.:
A/a.c
A/B/b.c
A/B/C/c.c
A/B/C/D/d.c

Additionally, we'll add some extra files in D:
A/B/C/D/e.c and A/B/C/D/f.c

Now the list of renames can begin: (Note: relative paths are listed
here for the files to indicate that their containing parent didn't
change)
A/B/C -> A/C
A/B -> A/C/B
D/d.c -> D/e.c
D/e.c -> D/f.c
D/f.c -> D/g.c
A/B/C/D -> A/C/E

Now, quick, before I give the answer, what does the destination tree look like?

Here we go:

A/a.c
A/C/c.c
A/C/B/b.c
A/C/E/e.c
A/C/E/f.c
A/C/E/g.c

These renames show some of the nastier classes of renames: Tree
Hierarchy changing ones, and simple dependent file renames where its
container name changed.

Here is the associated step by step application of the above pesudo-code:
Deepest first means:
A/B/C/D/f.c -> <temp1>/f.c
A/B/C/D/e.c -> <temp2>/e.c
A/B/C/D/d.c -> <temp3>/d.c
A/B/C/D -> <temp4>/D
A/B/C (and its contents: c.c) -> <temp5>/C
A/B (and its contents: b.c) -> <temp6>/B

Thus ends pass 1.

Now we have pass 2: (ordered shallowest first by destination)
<temp5>/C (and its contents: c.c) -> A/C
<temp6>/B (and its contents: b.c) -> A/C/B
<temp4>/D (and its contents: nothing atm) -> A/C/E
<temp1>/f.c -> A/C/E/g.c
<temp2>/e.c -> A/C/E/f.c
<temp3>/d.c -> A/C/E/e.c

Now we have the final tree structure.
This approach also handles the A -> B and B -> A swap you were talking
about Greg.

Additional fun wrinkles to watch out for:
Partial undoing of renames. (i.e. reverting a pending rename)
Partial committing of renames. (committing A/C/E/e.c but not
A/C/E/f.c) (If you allow this you undo non-committing renames on the
set of renames before applying the submitted changes.)
Partial merging of renames.
Calculating the merged rename path where the destination tree has had
hierarchy rearranging renames of its own.
Resolving rename merge conflicts in such a way as to make sense to the
user when you have an arbitrary tree structure you're merging into.

A rename that is being merged consists of a source tree rename and a
destination tree rename that you're trying to construct.
The important parts of this operation are the source tree rename
destination, the source tree rename destination parent (strdp), the
current location of the source item in the destination tree and the
current location of the strdp in the destination tree. (strdp')
The suggested destination tree destination of the merged rename will
be: strdp'\<the item's rename destination name>

e.g.: You have this tree:
A/B/C/c.c

branch A -> A'
rename A'/B -> A'/D
rename A/B/C/c.c -> A/B/d.c
merge A -> A'
This merge needs to merge the c.c rename into A'.
In this case strdp is A/B. strdp's name in branch A' is A'/D.
The suggested merged rename destination is: A'/D/d.c
Note: Just because I didn't show the two pass steps here don't mean it
doesn't need to be done in the general case. :)

The same approach is also used during checkin. (Someone could have
checked in renames behind your back) (Although you should probably
complain and say: updated, retest and then checkin)
The same approach could also be used during update. (Pulling down
someone else's rename on top of your already existing intersecting
pending renames)

How does this approach apply to Ev2? Not a clue at all.

If you get me to ApacheCon (hah!) I'd be glad to draw on Whiteboards. ;)

The above annoying cases are the major reasons that not many version
control systems properly deal with renames. Perforce doesn't even try
for example.

But Bill, can't I just prevent some of these cases from occurring
while pending changes in my working copy?
Well, yes you can, but what are you going to do in the merge case or
in the update case? Remember, anything you can do manually across more
than checkin/commit will always be combined into one checkin/commit
when you do a merge or an update.

Bill

On Sat, Nov 5, 2011 at 8:53 PM, Hyrum K Wright
<hyrum.wri...@wandisco.com> wrote:
>
>
> On Sat, Nov 5, 2011 at 7:21 PM, Greg Stein <gst...@gmail.com> wrote:
>>
>> Julian raised a question when I saw him in September: how does Ev2
>> deal with a node swap? ie. swap the contents of A and B, retaining
>> metadata that they were moves [rather than copies from history].
>>
>> In Ev2, we attempt to disallow "mv A B ; mv B C". The semantics around
>> the interface say you should just do "mv A C".
>>
>> But to accomplish an A/B swap, you must "mv A temp; mv B A; mv temp
>> B". Thus, A -> temp -> B, which is nominally disallowed in the Ev2
>> interface.
>>
>> Question: how to resolve this, given that we're starting to record
>> *move* information? (the current interface could do: mv(replacing) A
>> B; copy B@REV A)
>>
>> I've been thinking about a "multiple move" API to describe these kinds
>> of changes. It isn't just a simple swap, since you could rotate
>> content through an arbitrary set of N nodes. Suppose we had an
>> interface that specified N nodes, and node[i] is moved to node[i+1],
>> with the last one moving to node[0]. Would this be sufficient? I
>> suspect that any moves *outside* of this ring uses the standard
>> interface -- it is not participating in the rotation. Hrm. Maybe the
>> API is named "rotate" rather than multiple move. (I say rotate because
>> I'm envisioning the larger case where you have (say) 5 nodes, and each
>> piece of content rotating to the next; looks like a clock or musical
>> chairs)
>>
>> I'm not a master of topography or graphs, but I suspect that any given
>> permutation of N nodes can be reduced to a set of rotations of subsets
>> of those N nodes. Thus, a general "rotate" API should be sufficient.
>>
>> Cheers,
>> -g
>>
>> ps. there is a corollary, unstated question of whether these kinds of
>> rotations/multiple-move can be recorded in our WC schema.
>
> I'll do some thinking on this topic, but if you've got some time next week
> during ApacheCon, it might be useful to have a sit down and hash some of
> this out.
> -Hyrum "down to 1154 test failures" Wright
>
> --
>
> uberSVN: Apache Subversion Made Easy
> http://www.uberSVN.com/
>

Re: Design query: Ev2 and multiple moves

Reply via email to