Re: Moves in FSFS

Julian Foad Tue, 17 Sep 2013 04:01:15 -0700

>> Branko Čibej <br...@wandisco.com> writes:
>>> That said, I still do not understand why a different ID would be needed
>>> before the copy-on-write happens. Is it because the client doesn't have
>>> the full history available? [...]

Hi Brane.

Ref. <http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics>.

A different id (than node-id+copy-id) is needed because I prefer to 
describe the semantics of moves (on the server) in such a way, not because of 
anything 
to do with the client side, nor anything to do with existing or potential 
editor designs.

Some important move-tracking query APIs are ones which will map between paths 
in one revision and "corresponding" paths in another revision.  For these 
purposes I believe the abstract model we need to present is one in which 
copying a directory creates new lines of history ("node-lines") for all nodes 
in the subtree, even though their content may not have diverged yet.  In other 
words, for the purposes of a query that asks "where" (at what path) in revision 
REV2 we would find the node corresponding to that at PATH1@REV1, it should 
behave the same *as if* copies were always full copies and never lazy.  
(Conversely, in merging, for the purpose of finding a common ancestor of 
changes to be merged, it may be easier to work with the late branching / 
lazy-copy model, as the nearest common ancestor can be nearer.)

The "node-line" concept is merely a tool to aid in the definition of the 
semantics.  I am not suggesting here that the node-line-id should be 
transmitted to the client or used in the editor APIs.  (Those are separate 
discussions in which we may or may not want to use such a concept.)

Do you object to my using this invented concept as a tool within the semantic 
specification, or do you object to this abstract concept being made concrete 
and stored and exposed?  Do you disagree with the semantics I defined, or find 
it hard to interpret, or is it that you would prefer to describe the same thing 
in a different way?  I'm not clear.

The same semantics could of course be defined in other ways.  However the 
definition as I've written it clearly doesn't work if we just write (node-id, 
copy-id) in place of (node-line-id).  Here is an example of how it makes a 
difference.

Start with

  r10:
    trunk
      /A
      /B

branch the trunk:

  r20:
    trunk
      /A
      /B
    branch
      /A (pointer to /trunk/A)
      /B (pointer to /trunk/B)

modify branch/A:

  r30:
    trunk
      /A
      /B
    branch
      /A
      /B (pointer to /trunk/B)

Now
 let's say we're diffing branch@20 and branch@30.  I want to be able to 
report a mapping between each path in branch@20 and the path in r30 
corresponding to "the same node", where "the same node" is to be defined
 in some way that makes sense for tracking moves.  In this simple 
example, there are not even any moves, and so I want the move-tracking 
code to be able to deduce the following 1:1 path-mapping between 
branch@20 and branch@20:

  PATH@20        PATH@30
  branch    <->  branch
  branch/A  <->  branch/A
  branch/B  <->  branch/B

It certainly must not report a simple (node-id, copy-id) correspondence, 
because that would look something like:

  PATH@20        PATH@30
  branch    <->  branch
  branch/A  <->  trunk/A  # or (nil) as it's out of tree-scope
  (nil)     <->  branch/A
  branch/B  <->  branch/B
which breaks the mapping between branch/A@20 and branch/A@30.

Hi Philip.

Branko Čibej wrote:
> Philip Martin wrote:
>> Another way to provide the moves between arbitrary revisions is to have
>> an id to path map per revision which allows the FS to find the path
>> associated with a given id.  However with lazy-copy this map is harder
>> to implement.

Harder in the sense that a naive map from each node-line-id to each reachable 
path in the revision would require adding N entries to the map when copying a 
subtree of N nodes, thus making copy no longer O(1).  To maintain O(1) copies 
we'd need something cleverer.

In my present definition of move semantics, the ids used in this map would be 
what I call "node-line" ids, not the raw (node-id, copy-id) pairs.  How 
copy-ids work is thus irrelevant to me.  (Reading between the lines, I think 
with your questions about how copy-id assignment works you meant to question 
how copy-id could possibly be used to answer move tracking queries, whereas 
Brane answered them as direct questions about how copy-id assignment currently 
works.)

- Julian

Re: Moves in FSFS

Reply via email to