Re: Moves in FSFS

Julian Foad Wed, 11 Sep 2013 11:40:06 -0700

Thanks, Stefan!  Very interesting thoughts.  Especially... (scroll down)...


Stefan Fuhrmann wrote:
On Wed, Sep 11, 2013 at 5:21 PM, Julian Foad wrote:
>>    http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics
>>    http://wiki.apache.org/subversion/MoveDev/MovesInFSFS
>> [...]
>> 
>> One issue that may be harder than it sounds at first is the concept of
>> 'node-line-id' rather than (node-id, copy-id) as the basis of the
>> definition.  The point is that when we copy (ordinary copy, not move)
>> a directory, we lazy-copy the children, which means each child keeps
>> its old (node-id, copy-id) unless and until it is modified.  That's
>> great for achieving the O(1) copy, but for move-tracking purposes each
>> child needs a unique "node-line-id" so its life-line can be uniquely
>> traced forward and back between this revision and a later revision by
>> which time it may have been modified and thus assigned a new copy-id.
>> 
>> Clearly it would defeat the O(1) cost if we were to construct a
>> node-line-id explicitly for every node in the tree at copy time.  Can
>> we instead define node-line-id such that we can compute it as needed,
>> from either an unmodified lazy-copied child or after such a child has
>> been modified, and get the same answer?  Or perhaps re-state the
>> problem to avoid this need?
> 
> I'm currently bogged down in svnlive prep work but here is my quick feedback;
> more to come next week.
> 
> Bottom line: looks ok, especially the API seems fine and performance in f7
> should be acceptable even if it is O(changes in [rA .. rB]).
> 
> General observations:
> 
>* We need a format bump for the extra "M" entries in the changed path
>  lists, potential "lazy" markers in the tree etc. But that is not a problem
>  as the log-addressing branch probably gets merged in about 2 weeks
>  time and bumps to f7 anyway. It also brings the infrastructure for
>  "mixed addressing" such that we may introduce extended structures
>  in existing repositories without touching existing revisions.
>
>* Existing copy&del pairs will not be treated as move since the node-line-id
>  does not match. Maybe, we can add some intelligence to 'svnadmin load'.
>
>* A copy effectively destroys all move relationships below it. That seems
>  unfortunate (say, you duplicate a project) but the solution to that would
>  probably require hierarchical IDs ("match IDs within the context of this
>  sub-tree").

That's a good observation.  Here's an example, to clarify:

  r10: trunk/foo

move foo to bar

  r20: trunk/bar

copy trunk to branch1

  r30: trunk/bar
       branch1/bar

Now request "svn diff -r10:30 branch1".  It would be useful if Subversion could 
say trunk/foo@10 moved to branch1/bar@30 in the context of this diff.  (Where I 
say "diff" we can also substitute "update", "merge", and so on.)

This only makes sense for a copy at or above the root path of the requested 
diff.  In this example, it makes sense for "diff -r10:30 branch1" and for "diff 
-r10:30 branch1/bar".  It does not make sense across a copy that happened below 
the target: in this case, "diff -r10:30 ^/" would NOT be expected to show 
foo@10->bar@30 as a move.

One way of looking at this is that our history-tracing that's used to find 
"-r10 branch@30" in such scenarios is *already* following copies at the root of 
the subtree as if they are moves, and in a way this would be extending that 
idea.

This seems like functionality that should be provided in a higher layer; the FS 
layer just needs to provide some primitive queries to make this possible.  I'm 
not sure what, exactly.


>* Support for resurrection of deleted nodes *without* destroying any move
>  relationship is potentially expensive but I think we should support this
>  early on (maybe not in 1.9 but def. in 1.10). People just happen to delete
>  their /trunk once in a while and you don't want to tell them that *now* they 
>  actually managed to break something ...

Yes.

>  Proposal: Resurrect keeping the old node-line IDs, iff
>  (a) the copy source (or a parent) got deleted in the next revision
>  (b) no copies of that node (or any parent) were added since the source rev.
>  That should keep normal copying relatively cheap and still provide the
>  special behavior for our "undo" use-case.

I think you mean that the user-level copy should do this automatically.  
Perhaps so.  That need not be implemented in the FS layer; the FS layer could 
just provide the primitives necessary to implement it.

> I'd like to implement that - after some in-depth more review and would even
> be willing to postpone the cache-server feature to 1.10 because move tracking
> is much more important atm.

Wonderful!

- Julian

Re: Moves in FSFS

Reply via email to