[RFC] Move tracking - design summary and mock-up proposal

Julian Foad Tue, 20 May 2014 07:27:12 -0700

Hi, folks.

At last I have a plan for move tracking. I want to share this plan and get
your feedback.


I want to start implementing it, but first Philip suggested, and I agree,
we should do a mock-up and see how the high-level behaviour pans out in
typical scenarios involving moves, especially merging with moves. There's a
lot of work here and implementing it would be risky unless we first check
whether it will deliver useful results.

The mock 'svn' client will implement the essence of the design, but with
the move info being initially just an explicit user input whenever the code
requires it. The next advancement can be that the move info is stored and
retrieved in any form whatsoever (revprops, versioned props, even off-line
text files). Then we can try out the scenarios that we imagine users will
expect to work, and see whether the design does in fact make these things
easy. An example:

  0 make branches A and B with some content
  1 in branch A, rename a directory and all the files inside it
  2 in branch B, modify some files
  3 merge "automatically" from A to B
  4 in branch A, make some text mods
  5 merge "automatically" from A to B -- check that mods from A go to the
right files in B with no user intervention

I'd prefer to do the mock-up in Python if we can. Does that seem
reasonable, or should I branch and hack the C code instead?


== DESIGN ==

I had an opportunity to discuss the design in person with Ben, Brane,
Philip and Stefan these last couple of weeks, which resulted in a good few
steps forward. I have been going through some variations, but here is more
or less the design in summary.


a BRANCH ELEMENT
  - is, immutably, a file or a directory (or a symlink in future);
  - has a separate but related life-line in each branch in the same branch
family;
  - has, in each rev in each branch, a parent directory element within the
same branch (except for the branch root element);
  - has, in each rev in each branch, a name within its parent element.

An instance of an element in a particular branch can, at each revision, be
moved to a new parent element and/or given a new name, as long as it stays
within its branch. It cannot be moved outside its branch or into a nested
branch.


a BRANCH
  - is rooted at exactly one element (usually a directory)
  - is the set of trees rooted at its root element in all revisions at
which the root element exists
  - is a member of exactly one branch family

The branch root element is distinguishable from non-branch-root elements
(at the repository level and at the client level).

The repository root directory is implicitly the root element of a singleton
top-level branch. (The top-level branch is not particularly interesting to
the user.) Every other branch root element is a non-root element of a
branch at the next higher branching level.

A branch is created in one of two ways:
  - creating a new element and explicitly designating it as the root
element of a (first) branch of a new branch family; or
  - branching (see below) the root element of an existing branch to create
a new branch of an existing branch family.


a BRANCH FAMILY
  - is a set of branches of the "same" tree;
  - results from initially creating a new branch and then directly and
transitively branching it, including by branching a higher-level branch;
  - can be nested inside another branch family, in a strict hierarchy.

When branches are nested, the nesting topology is uniform: if one branch of
family F is inside a branch of family G, then every branch of family F is
inside a branch of family G.

e.g. the outer family contains branches {trunk, br1, ...} and the nested
family contains branches {fs_fs, fs_x}:

^/.../trunk/
    +--- subversion/libsvn_fs_fs/  ...
    +--- subversion/libsvn_fs_x/  ...

^/.../branches/br1/
    +--- subversion/libsvn_fs_fs/  ...
    +--- subversion/experimental/libsvn_fs_x/  ...

There is no fundamental distinction at the repository level between the
"original" branch and any other branch in the same family, nor a
directional relationship implied by the order in which subsequent branches
are added to the family. The "original", and the order of subsequent
branching, are only distinguishable by looking at client-level metadata
(the "copy-from" metadata and/or mergeinfo; see "copying").


MOVING

An element of a branch can be moved to a new parent element and/or a new
name within its own branch, but cannot be moved into a nested branch or out
of its own branch. In other words, a move cannot cross a branch boundary.

The same applies to moving a whole branch, which is achieved by moving its
root element to a new parent and/or name within the outer branch.


COPYING versus BRANCHING

At the repository level, what we have previously called the "copy"
relationship is only useful as a branching relationship. The "copy-from"
information is client-level metadata, not needed by the repository itself,
and so shall be stored elsewhere. Thus, at the repository level, copying is
branching.

At the client level, branches are conceptually distinguished, and so the
terms "copying" and "branching" are naturally distinguished: copying makes
a new element (or tree) that is not a branch, and branching makes a new
branch in the same family as an existing branch.

Normally a user would only ever branch a branch, and copy a non-branch. In
practice it would be unusual to want to "copy" a branch so as to make a new
non-branch element or tree, or to create a new branch as a copy of a
non-branch, but these variants could be provided for advanced users.

A suggested user interface would have
  - "branch" applied to a branch (root) creates a new branch in the same
branch family;
  - "copy" applied to a non-branch creates a new non-branch tree as a copy
of the source;
  - "branch" applied to a non-branch or "copy" to a branch is an error,
except in advanced usage.

An alternative user interface could have just one operation distinguished
by the specified source:
  - "copy" applied to a branch (root) creates a new branch in the same
branch family;
  - "copy" applied to a non-branch creates a new non-branch tree as a copy
of the source.

In any case, the new branch or non-branch shall be tagged with client-level
metadata saying where it was copied from.


LIFE-LINES and RESURRECTION

The life-line of an element on a given branch is a subset of the life-line
of its branch. At each revision in the life-line of its branch, the element
can exist (at an arbitrary name and parent element within the branch) or
not exist.

An element can be resurrected: that is, after being deleted, it can be
brought back into existence on the same branch, as the same element, and
not just as a new element copied from it. This is essentially the same as
what must happen when we merge the creation of this element to any other
branch on which it is not currently alive, and so it is a natural part of
branching and merging. (Resurrection is a necessary consequence of the fact
that, for self-consistency, it must be possible to reverse-merge any
change. Reverse-merging a deletion, whether onto the same branch or onto
another branch, must resurrect an instance of the same element that was
deleted, and not just create a copy of it.)

As a branch root is an element of an outer branch, it follows that each
branch itself can be deleted and resurrected. A branch family can therefore
have zero or more of its branches in existence at any one time.


IDs

Stefan2 and Brane and I have ideas about the repository schema and IDs. One
core point, to the best of my understanding so far, is that in the
<node-id.copy-id.txn-id> scheme, the <node-id> shall identify an element
uniquely across all elements of all branch families (but repeated for each
branch within a family); and the <copy-id> shall identify the branch
uniquely among all branches in all branch families in the repository. The
<copy-id> shall no longer be used when the user wants a simple
non-branching copy; instead, in that case a new node-id (or tree of node
ids) shall be assigned.

We'll write separately more about the repository schema and IDs.


FWIW, I realize it's a very big development, but it no longer seems
impossibly far off. The pieces are coming together.

Comments? Help?

- Julian

[RFC] Move tracking - design summary and mock-up proposal

Reply via email to