Re: Semantics of Move

Julian Foad Tue, 03 Sep 2013 09:02:13 -0700

On 2013-08-16, Blair Zajac wrote:
> Julian Foad wrote:
>>>> On 08/15/2013 08:01 AM, Julian Foad wrote:
>>>>> I propose the following logical semantics of the versioned move
>>>>> operation that is the basis of move tracking, independent of any
>>>>> implementation.
>>>>> 
>>>>> A versioned move of the node with node id "N", with respect to two
>>>>> revisions rX and rY (X < Y), shall mean:
>>>>> 
>>>>>   * Same node id. A node with node id N exists in rX and in rY. It
>>>>>   is "the same node". It therefore has the same node kind. It may have
>>>>>   content modifications.
>> 
>> The key that I am referring to in the description of moves is (node-id,
>> copy-id).  Perhaps I will call it "node-copy-id" for short.  I just
>> checked and that term currently does not appear in the source tree.  I'll
>> update my docs.  Thanks for pointing out the confusion


In discussion yesterday I realized that the (node-id, copy-id) pair from the 
filesystem is still not what I want to refer to in this definition of 
semantics, because of lazy copies.  When we copy a directory, we don't assign a 
new copy-id to each child right away, but only if and when the child is 
modified.  I want to define the move semantics in terms of some (conceptual) id 
that is always new for each copied path, including all the unmodified children 
of a copied directory.

Therefore I've reworked the definition and also expanded and rewritten some 
other parts of it.  The result is at 
<http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics> and repeated 
in plain text below.


----------

This section specifies the logical semantics of the versioned move operation 
that is the basis of move tracking, independent of any implementation.


== Node-Line-Id ==

Assume that each versioned path in a revision has an identifier that we will 
call its node-line-id.  The node-line-id need not physically exist: it is a 
concept used in the definition of moves but not necessarily in the 
implementation.

The node-line-id is preserved when the content of a node is modified.  A new 
node-line-id is assigned to every new node that is created by addition or by 
copying, including when it replaces a previous node at the same path.  This new 
node-line-id is unique within the whole repository.  Within any given revision, 
each node-line-id is unique among all the paths.

The node-line-id is similar to the (node-id, copy-id) tuple in the existing 
Subversion filesystems, except that the lazy-copy mechanism does not assign a 
new copy-id to a child of a copy until that child (or one of its descendants) 
is modified.  Therefore an unmodified child of a copy has the same (node-id, 
copy-id) as the corresponding child path of the copy source, whereas (by 
definition) it has a new node-line-id.

Let the term node-line refer to the set of PATH@REV locations that have a given 
node-line-id.


== Definition of Move ==

Given a node-line N and two revisions rX and rY (X < Y), the definition of N 
being moved in rY with respect to rX is:

    * Same node-line.  A node-line with node-line-id N exists at path PN,X in 
rX and at path PN,Y in rY.  It is the same node-line, and so has the same node 
kind.  Its content may differ.

    * Move and/or rename.  The node-line N has either or both of: a different 
name (base name) in rX than in rY; and/or a different parent (parent directory 
node-line-id) in rX than in rY.  Thus, the paths PN,X and PN,Y typically differ 
but may be the same.

    * No gap.  There cannot be a gap in the range of revisions: node-line-id N 
exists in every revision rX, rX+1, ..., rY-1, rY.  Contrast with 
copy-and-delete, where there can be a gap between the delete and the copy.  The 
possible "resurrection" extension to the move semantics would permit a gap.

    * Children follow.  If N is a directory, each child (recursively) of N in 
rX remains a child of N in rY, with the same name, unless it is separately 
moved or deleted.  Any or all of the children can be separately moved within or 
outside the subtree at N, at the same time as N is moved.


== Properties of Move ==

Some properties of the move relationship are:

    * Unique. move(A@X,B@Y) and move(A@X,C@Y) cannot both be true.

    * Transitive. move(A@X,B@Y) followed by move(B@Y,C@Z) collapses to 
move(A@X,C@Z).

    * Reversible. move(A@X,B@Y) followed by move(B@Y,A@Z) collapses to no-move.

    * Time-symmetric. move(A@X,B@Y) is symmetric with the time-reversed 
relationship  move(B@Y,A@X).

    * No null move. An attempted move which does not change the node's name or 
its parent node, with or without a modification, is not distinguished from a 
normal succession of history.

In the notation A@X, A represents a parent directory node-line-id and child 
name (rather than a full path), X represents a revision number, and A != B, and 
X != Y, etc.


== Move versus Rename ==

In the versioned data model semantics, "move" refers to a change of parent node 
and/or a change of name.

At a higher level of semantics, for example when resolving conflicts during 
merge, it can be useful to distinguish between renaming and moving to a 
different parent node.


== Can't Move a Child of a Copy ==

Moving a child of a copy, within the same revision, is not tracked: it is an 
unversioned operation.

A versioned "move" takes a node that existed in the previous revision and 
places it in a new location.  A copy, however, always creates new nodes, 
conceptually, even if the internal representation is a "lazy copy" pointer to 
the old node.  Moving a child node therefore is a rearrangement of the new 
content.  It is semantically the same as deleting the child node and creating a 
copy of it somewhere else.  Compare with copying a node and then moving that 
copy somewhere else.

If we perform a copy and then move a child of it, either in a WC or in a 
repository, this should create a copy with a deleted child, and then another 
copy somewhere else which is the "moved" child in its new location.  We can 
describe the relationship between the initial and final states perfectly well 
without saying "move", in the form "the subtree at path P@X is copied to path 
P1@Y, except for its child C which is copied to path P2/C@Y instead".  Thus 
there is no loss of semantic information despite the absence of "move" in the 
result.


== Resurrecting a Deleted Node ==

A possible extension to the move semantics would be to allow a previously 
deleted node to be resurrected at the same location or at a different location.

The mental model is this.  When a node is deleted, it is unlinked from the 
versioned tree, but its content continues to exist in the repository.  When the 
node is resurrected, a link to that last version of its content is put into the 
versioned tree, like undoing the delete.  The new link can be made anywhere in 
the versioned tree, like undoing the delete and moving the node at the same 
time.

This is not merely a blue-sky curiosity: it may necessary in order to ensure 
logical completeness.  For example, let's say the node initially at path A/foo 
is moved to B/foo and then back to A/foo.  If we create a branch C from A, and 
are continuously merging all changes under A into C, then C/foo will be deleted 
and will later be recreated.  Or if 'svnsync' is replicating only subtree 'A' 
of repository R1 into repository R2, then in repository R2 we will see A/foo 
disappear and then reappear.  In cases such as these, the 'reappearance' should 
be modeled as a resurrection; if it is modeled as a plain add or as a copy, it 
will not have the correct semantics of being the 'same' node that it was 
before. 

------------------------------


- Julian

Re: Semantics of Move

Reply via email to