Hi all, I spend most of this week thinking about moves and I changed my mind as further evidence came up.
What I fully agree with is: http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics Then, I found an equivalent expression for the nodeLineID concept central to the move semantics: http://svn.haxx.se/dev/archive-2013-09/0112.shtml and showed there as well as here: http://svn.haxx.se/dev/archive-2013-09/0128.shtml that the current and proposed usage of nodeID, copyID is not equivalent, i.e. cannot be used immediately as a bases for path matching. Finding Moves -------------- This brought me to realization that having move semantics (and thus identity) defined is necessary and useful but the implementation will most likely be a history scan. Following internal node chains to identify common copy roots is very time consuming and I/O inefficient. History scans are quick in f7 (10k+ revs/s) and might be even faster with 1.10 (100k+ revs/s) but you want to do only one or at most a handful of them for an update editor run. One way to do it has been proposed here: http://svn.haxx.se/dev/archive-2013-09/0136.shtml Another way would be to tell the server the full range (oldest revision) as part of the reporter initialization. The results of the scan might then even be applicable to mixed rev working copies. Storing Moves ------------- >From the FS POV, moves behave like simple ADD/DEL pairs and should, therefore, be stored as such. The differences are: * deletes due to moves must not be executed if the respective path already got replaced * there must be only one move per revision for any source as well as target path (to be checked upon transaction commit) * instead of "add"/"replace", say "move"/"movereplace" in the changes list to signify the user's intend But other than that, existing code should DTRT even in the presence of moves and backward compat APIs can simply report moves as ADD/DEL pairs without becoming inconsistent with the actual data. Later, revised FS implementations may then impose additional ID <-> operation dependencies internally. History Traversal and Moves --------------------------- This might be opt-out but most users would want to follow renames / moves even if they don't follow copies. Since both have the same internal representation, we need to look into the changed paths list to determine whether the change has been an actual ADD or a MOVE. That is virtually for free as we usually to report the changed paths for all relevant revisions anyway. Auto-move Option ---------------- It is safe to automatically treat all unique DEL(pathA) + ADD(pathA@R-1 -> pathB) pairs in revision R as moves, iff the user did not intend to treat them as separate node lines. This will not modify recorded history but only reduce the number of tree conflicts a user may see. Therefore, log and reporter API should be rev'ed to take an extra "auto_move" boolean parameter. If set, the server will attempt to report moves instead of DEL/ADD pairs with no guarantee that all or even any of the pairs will actually be detected (future backends may or may not support this option). -- Stefan^2. On Wed, Sep 11, 2013 at 5:21 PM, Julian Foad <julianf...@btopenworld.com>wrote: > While discussions continue about the "editor" and the WC side of move > tracking, I'd like to make some progress on the Repository side. > > The Wiki page > > http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics > > declares the semantic model and > > http://wiki.apache.org/subversion/MoveDev/MovesInFSFS > > attempts to specify the changes needed. I think the model for move > semantics in the repository as presented there is pretty solid -- > precise and complete and having a nice set of properties. > > Can anyone review this? If anyone can take it forward in any way that > would be a great help, otherwise I'll tackle the implementation > myself. > > One issue that may be harder than it sounds at first is the concept of > 'node-line-id' rather than (node-id, copy-id) as the basis of the > definition. The point is that when we copy (ordinary copy, not move) > a directory, we lazy-copy the children, which means each child keeps > its old (node-id, copy-id) unless and until it is modified. That's > great for achieving the O(1) copy, but for move-tracking purposes each > child needs a unique "node-line-id" so its life-line can be uniquely > traced forward and back between this revision and a later revision by > which time it may have been modified and thus assigned a new copy-id. > > Clearly it would defeat the O(1) cost if we were to construct a > node-line-id explicitly for every node in the tree at copy time. Can > we instead define node-line-id such that we can compute it as needed, > from either an unmodified lazy-copied child or after such a child has > been modified, and get the same answer? Or perhaps re-state the > problem to avoid this need? > > - Julian >