In the wiki page you started, I thought we already solved the move issue by using the original state for the source. (and remove rotate)
I didn't read this treatise, cuz I think it has an incorrect starting assumption. You state early about sequential edits, when we relaxed that a bit due to needing to retain the original state. Does it respond to the proposal from your wiki page? Thx, -g On Aug 12, 2013 10:43 AM, "Julian Foad" <julianf...@btopenworld.com> wrote: > TLDR: Ev2 move support doesn't work with the "move" and "rotate" > instructions as currently defined, and I propose that it could work instead > with separate "move-away" and "move-here" instructions. > > > == SUMMARY == > > Ev2 was designed with an intention to support “move” semantics, but the > current design of its move() method does not satisfy the requirements. Here > I explain why, and present a solution. > > The following principles constrain or guide Ev2 design: > > * An edit must be able to express any change that can be constructed in > a WC and committed. > > * An edit must be able to express any change that can be constructed in > a repo across multiple revisions. > > * Ev2 uses sequential description: a succession of incremental edits > applied to an initial state gives a final state. > > * Ev2 should not require the driver to use temporary paths to express a > change. > > Given the constraints, > > * Not all moves can be expressed by a “move source to destination” > operation, with or without a “rotate” operation. > > * Moves can be expressed, in general, using separate move-away and > move-here operations. > > While other designs are also possible, a move-away and move-here design > might be the best fit for the shape of the editor. Such a design is > detailed below, but not yet finished. > > > == WHY IS THIS NECESSARY? == > > === Express Any Change === > > Since the purpose of the editor is to communicate a change originating in > a WC when it is committed, and a change originating in a repository when > the WC is updated, then it must be able to express any such change. This > includes updates across multiple revisions, and from a mixed-revision state > to a single revision. > > Through a series of simple steps of the form “move X to Y”, some quite > funky overall changes can be created. For example, starting from this state: > > | > +-- A > | > +-- B > > the following sequence: > > move /A/B /X > move /A /X/B > move /X /A > > results in swapping A with B. If those steps are performed in a WC and the > result committed all at once, the editor needs to be able to handle it. If > those steps are committed separately, and then a working copy is updated, > the editor needs to be able to handle it. > > More examples are given later, some of which involve other operations such > as “make directory” as well as moves. All of this emergent complexity > results from the introduction of a simple “move” primitive, and there does > not seem to be any acceptable way to further constrain the basic move that > would simplify the derived complexity.Temporary Paths > > Paths in the Ev2 editor operations refer to the current state of the tree > being edited, at that moment in the sequence of edit operations. (The sole > exception is the copy-from path, which is relative to a given committed > revision.) > > A natural and compact way of expressing moves would be as a mapping from > original-state paths to final-state paths. However, that paradigm does not > fit well with the desire to express the edit as a sequence of incremental > steps. If we are going to include move functionality as steps in a sequence > of edit operations, it makes sense to use paths that are relative to the > current state. > > Ev2 should not require the driver to express a change as a sequence of > operations that can include moving a node to a temporary path and then > later moving it again to the final path. The end result of the edit is the > important part, and there are unlimited potential sequences that lead to > that result, and it does not make sense to require an edit driver to > construct such a sequence arbitrarily, if there is an alternative method > that specifies the result uniquely. The receiver may in fact need to employ > temporary paths in its implementation, but then it knows this and it is in > a better position to construct such paths when needed, and it will know > that they are temporary which may be important. > > There are advantages to placing a node in its final position exactly once. > It was claimed that Google's BigTable implementation of Svn's back-end > would have benefited from knowing that once a node has been edited then it > is in its final state. Ev2 aims to provide that guarantee, where Ev1 could > not. > > === Sequential Description === > > Ev2 (svn_editor_t) intends to express a new state in terms of an old state > and a description of the parts of the new state that differ from the old > state, or, in other words, a description of the changes against the old > state. It uses a sequential, incremental description: for example, “add > directory X, then copy file X/Y from somewhere, then edit the contents and > properties of file X/Y”. > > Only certain parts of the description are incremental. Ev2 aims to allow > nodes to be visited in arbitrary order, subject to a small number of > restrictions. The parts where ordering matters are: > * tree changes before content changes > > * alter/add/copy a directory (if required) before touching its > (immediate) children > > === A Move Event is Not Adequate === > > Moves, in general, cannot be expressed as “move from path X to path Y” > events in such a sequential description without introducing temporary > paths. This is because some cases require the source path of the move to be > moved away, then some other steps, and then the destination path of the > move can be populated. Some classic examples are: > > Example 1: Insert a directory level > > | | > +--A mv--\ (add) +--A > \ | > \--> +--B > > The add cannot happen before the move-away, but must happen before the > move-here. > > (The mirror of Example 1, which would be removing a directory level, is > not actually problematic: > > | | > +--A (del) /--> +--A > | / > +--B mv--/ > > The move-away must (in principle) happen before the delete, while the > move-here cannot (in principle) happen before the delete. However, Ev2 > defines that a deletion which is to be replaced shall occur implicitly when > the replacement is put in place, and so the move-here can happen > simultaneously with the delete.) > > Example 2: Swap two siblings > > | | > +--A mv--\ /--> +--A > | X | > +--B mv--/ \--> +--B > > Neither of the moves can be completed before doing the move-away part of > the other one. > > Example 3: Swap two directory levels > > | | > +--A mv--\ /--> +--A > | X | > +--B mv--/ \--> +--B > Neither of the moves can be completed before doing the move-away part of > the other one. > > === A Rotate Event is Not Adequate === > > At one time there was an idea that the addition of a “rotate PATH1 PATH2 … > PATHn” event would complete the semantics and allow arbitrary moves to be > supported. > > While this does enable Example 2 (swap two siblings) and Example 3 (swap > two directory levels) and many other cases, it does not help with inserting > a directory level (Example 1), and it has been shown [1] to be incapable of > resolving other more involved cases involving swapping or rotation. One > specific example is swapping A with A/B/C [2]: > > | | > +-- A mv--\ /--> +-- A > | \ / | > +-- B mv--- X ---> +-- B > | / \ | > +-- C mv--/ \--> +-- C > > [1] Email thread on dev@, “Ev2 as a move-aware editor”, started on > 2013-06-24, e.g. <http://svn.haxx.se/dev/archive-2013-06/0560.shtml> or < > http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C1372087321.76009.YahooMailNeo%40web186101.mail.ir2.yahoo.com%3E > >. > > [2] An example problem in thread [1], of swapping A with A/B/C: < > http://svn.haxx.se/dev/archive-2013-06/0684.shtml> or < > http://mail-archives.apache.org/mod_mbox/subversion-dev/201306.mbox/%3C87bo6rewwp.fsf%40ntlworld.com%3E > >. > > > == MOVE-AWAY AND MOVE-HERE == > > One solution is to describe the two halves of each move separately: > > move-away SOURCE-PATH > ... > move-here DESTINATION-PATH > > We can then solve Example 1 in the following way: issue the “move-away A”, > then create a new directory at path A which replaces that source of the > move, and then finally issue the “move-here A/B” which relies on that > replacement directory A having been created. > > The consumer must be able to put the node aside for an indeterminate > amount of time until the “move-here” is received. > > Of course there needs to be a way to link each move-away with the > corresponding move-here. Remembering that each edit step refers to the > current state in a sequence of states, we cannot simply specify the path > corresponding to the other end of the move like this: > > move-away SOURCE-PATH to DESTINATION-PATH > ... > move-here DESTINATION-PATH from SOURCE-PATH > > because the problem cases are when the destination path does not yet exist > at the time of a move-away, or the source path no longer exists at the time > of a move-here. What we can do is use some other unique reference that is > unique within the edit, like this: > > move-away SOURCE-PATH as identifier ID1 > ... > move-here DESTINATION-PATH from identifier ID1 > > The reference could perhaps be the destination path as it will finally > exist at the end of the edit, or just and arbitrary number or string. We > will just specify the identifier as an “id” and not specify how it is > generated. > > === Ordering Restrictions === > > The ordering rules regarding move-away and move-here should include: > * mv-away must come before the matching mv-here > > - The edit should provide a sequential procedure that the consumer > must be able to follow without having to buffer an arbitrary amount of > state. > > * mv-here & cp & add must be in nesting order: create (or put in place) > the parent before its children > > * mv-away must come before deleting a parent > > - Receiver needs to know that it must preserve this path when we > delete its parent. > > * mv-away must come before mv-away of a parent > > - If we allowed “mv-away A; …; mv-away A/B” then the child path “A/B” > would have to be specified not relative to the current state of the edit, > as all other operative paths are, but in some other way, because the parent > has gone into temporary namespace, and has perhaps been replaced so that > “A/B” now refers to some other node. > > - There is a general rule that all edits within a moved directory “A” > must come after A is moved its destination, but a mv-away of a subtree of A > is not considered an edit for this purpose. > > === Examples Solved === > > Example 1: > > | | > +--A mv--\ add--> +--A > \ | > \--> +--B > > 1. alter-dir / (children={A}) > 2. mv-away A (id=“original A”) > 3. add-directory A (children={B}) > 4. mv-here A/B (id=“original A”) > > Example 2: Swapping two siblings > > | | > +--A --\ /--> +--A > | X | > +--B --/ \--> +--B > > 1. alter-dir / (children={A,B}) > 2. mv-away A (id=“original A”) > 3. mv-away B (id=“original B”) > 4. mv-here A (id=“original B”) > 5. mv-here B (id=“original A”) > > Example 3 can also be solved in this way, except for some ordering > restriction issues that are discussed below.The Need for Alter-directory > before Tree Changes > > Why do we have this ordering rule? > > If any path is added (with add_*) or deleted/moved/rotated, then > an svn_editor_alter_directory() call must be made for its parent > directory with the target/eventual set of children. > > It can't be quite right. Inside an add-directory, all the children have to > be added, and that would imply we need an alter-directory as well as the > add-directory, violating the Once Rule. It omits “copied” – just an > oversight? It does not specify when the call must occur – presumably it > must be before any such modifications to the children. > > To remedy those three initial problems, I suppose something like this is > intended: > > Either alter_directory() or add_directory() must be called on a > directory, declaring its final set of children, before calling > delete(), move_away(), move_here(), copy(), or add_*() on any child. > > (For delete() or move_away(), it must be alter_directory(), as > the children of add_directory() must not be deleted or moved away.) > > But there is still a problem. If we require alter_directory() before a > move_away(), it leads to a violation of the Once Rule as shown in the > following example. > > Example 3: Swapping two directory levels > > | | > +--A --\ /--> +--A > | X | > +--B --/ \--> +--B > > 1. alter-dir A (children={}) ### Needed? > > 2. mv-away A/B (id=”original A/B”) > > 3. alter-dir / (children={A}) > > 4. mv-away A (id=”original A”) > > 5. mv-here A (id=”original A/B”) > > 6. alter-dir A (children={B}) ### Second alter-dir on same path, if (1) > was needed. > > 7. mv-here A/B (id=”original A”) > > There are two potential problems here: > > * We make an edit within subtree A (the “move-away A/B”) before moving A > > * The “alter-dir A” is performed twice > > The first point violates the assumed constraint that we should disallow > edits within a subtree before moving the subtree. > > The second point at face value violates the Once Rule. We can probably > resolve is by making the Once Rule apply per node rather than per path – > see below. > > === What If We Allow Edit Before Move? === > > We were assuming that we should disallow edits within a subtree before > moving the subtree. [Why?] > > One solution might be to drop that requirement and let the subtree be > moved (move-away) part way through editing it, allowing all editing > operations. To accommodate such a change, some of the other rules that > currently refer to “a path” probably need to be reformulated to refer to “a > path relative to such a subtree” instead. > > If we allow edits before and after moving, should we also allow edits > after the move-away and before the move-here? Not sure. It seems like that > may be more problematic for certain consumer architectures and so probably > should not be allowed. But is there a better way to decide? > > === What if the Once Rule is Per Node? === > > The path-based Once Rule was written something like this: > > A path should never be referenced more than once by the add_*, alter_*, > and delete > operations (the "Once Rule"). The source path of a copy (and its > children, if a directory) > may be copied many times, and are otherwise subject to the Once Rule. > The destination path > of a copy [or move_here] may have alter_* operations applied, but not > add_* or delete. If > the destination path of a copy or move is a directory, then its children > are subject to the > Once Rule. The source path of a move_away() (and its child paths) may be > replaced using > add_*() or copy() or move_here() (where these new or copied nodes are > subject to the Once Rule). > > Perhaps the Once Rule should not apply per path as it was stated, but > rather per node. If a directory is altered and then moved away, we should > be able to create a replacement directory, being a different node at the > same path, and then alter that. > > * The Once Rule applies per node, rather than per path. The definition > of a “node” is such that “move” moves an existing node (or nodes) to a new > path and does not create a new node, while “add” creates a new node and > “copy” creates a set of new nodes, each new node being different from all > other nodes that are or were in the tree even if it replaces an existing > node at the same path. > > * One of the following actions can be applied, just Once, directly to > each node: > > - create (only if it does not exist in the initial tree) > > - remove (only if it exists in the initial tree or is brought in as > a child of a copy) > > - modify (only if it exists in the initial tree or is brought in as > a child of a copy) > > * A node may be created by one of: > > - add_*() > > - copy(), optionally followed by alter_*() > > No other operations may be applied to such a node during the entire > edit. Any children may be edited after (not before) the node is created. > When a node is created by copy(), its children (recursively) are brought > into the tree, and are then subject to the Once Rule as existing nodes. > > * A node may be removed, along with all its children recursively, by one > of: > > - delete() > > - add_*() replacing this node > > - copy() replacing this node > > - move_here() replacing this node > > No other operations may be applied to such a node, nor to any children > (recursively), during the entire edit, except for nodes that have been > moved away. > > * A node may be modified by any combination from the following: > > - move_away() followed by move_here(), at most once > > - alter_*(), at most once > > - (if a directory) edit a child > > These can come in any order except that alter_*() is required before > editing any children, and neither of those can come between move-away and > move-here except for move-away of a deeper child [and/or editing of such > prior to move-away?]. > > [### This seems more complex than it could be. It would be much easier > if alter_directory() were not required before adding/moving/editing > children.] > > * The source of a copy operation may be a node in the tree being edited. > Such a node (and its children, if a directory) may be copied many times, in > addition to being subject to the Once Rule as existing nodes. > > [### I'm not sure about the Copy operation: the doc string implies > it's copying from something in the current state, but it should be able to > copy from any path-rev.] > > > == CONCLUSION == > > With a bit more work on the ordering restrictions, especially the > requirement about calling alter-dir, it looks like this design may fit the > requirements. > > Thoughts? > > - Julian > >