Hi, folks. At last I have a plan for move tracking. I want to share this plan and get your feedback.
I want to start implementing it, but first Philip suggested, and I agree, we should do a mock-up and see how the high-level behaviour pans out in typical scenarios involving moves, especially merging with moves. There's a lot of work here and implementing it would be risky unless we first check whether it will deliver useful results. The mock 'svn' client will implement the essence of the design, but with the move info being initially just an explicit user input whenever the code requires it. The next advancement can be that the move info is stored and retrieved in any form whatsoever (revprops, versioned props, even off-line text files). Then we can try out the scenarios that we imagine users will expect to work, and see whether the design does in fact make these things easy. An example: 0 make branches A and B with some content 1 in branch A, rename a directory and all the files inside it 2 in branch B, modify some files 3 merge "automatically" from A to B 4 in branch A, make some text mods 5 merge "automatically" from A to B -- check that mods from A go to the right files in B with no user intervention I'd prefer to do the mock-up in Python if we can. Does that seem reasonable, or should I branch and hack the C code instead? == DESIGN == I had an opportunity to discuss the design in person with Ben, Brane, Philip and Stefan these last couple of weeks, which resulted in a good few steps forward. I have been going through some variations, but here is more or less the design in summary. a BRANCH ELEMENT - is, immutably, a file or a directory (or a symlink in future); - has a separate but related life-line in each branch in the same branch family; - has, in each rev in each branch, a parent directory element within the same branch (except for the branch root element); - has, in each rev in each branch, a name within its parent element. An instance of an element in a particular branch can, at each revision, be moved to a new parent element and/or given a new name, as long as it stays within its branch. It cannot be moved outside its branch or into a nested branch. a BRANCH - is rooted at exactly one element (usually a directory) - is the set of trees rooted at its root element in all revisions at which the root element exists - is a member of exactly one branch family The branch root element is distinguishable from non-branch-root elements (at the repository level and at the client level). The repository root directory is implicitly the root element of a singleton top-level branch. (The top-level branch is not particularly interesting to the user.) Every other branch root element is a non-root element of a branch at the next higher branching level. A branch is created in one of two ways: - creating a new element and explicitly designating it as the root element of a (first) branch of a new branch family; or - branching (see below) the root element of an existing branch to create a new branch of an existing branch family. a BRANCH FAMILY - is a set of branches of the "same" tree; - results from initially creating a new branch and then directly and transitively branching it, including by branching a higher-level branch; - can be nested inside another branch family, in a strict hierarchy. When branches are nested, the nesting topology is uniform: if one branch of family F is inside a branch of family G, then every branch of family F is inside a branch of family G. e.g. the outer family contains branches {trunk, br1, ...} and the nested family contains branches {fs_fs, fs_x}: ^/.../trunk/ +--- subversion/libsvn_fs_fs/ ... +--- subversion/libsvn_fs_x/ ... ^/.../branches/br1/ +--- subversion/libsvn_fs_fs/ ... +--- subversion/experimental/libsvn_fs_x/ ... There is no fundamental distinction at the repository level between the "original" branch and any other branch in the same family, nor a directional relationship implied by the order in which subsequent branches are added to the family. The "original", and the order of subsequent branching, are only distinguishable by looking at client-level metadata (the "copy-from" metadata and/or mergeinfo; see "copying"). MOVING An element of a branch can be moved to a new parent element and/or a new name within its own branch, but cannot be moved into a nested branch or out of its own branch. In other words, a move cannot cross a branch boundary. The same applies to moving a whole branch, which is achieved by moving its root element to a new parent and/or name within the outer branch. COPYING versus BRANCHING At the repository level, what we have previously called the "copy" relationship is only useful as a branching relationship. The "copy-from" information is client-level metadata, not needed by the repository itself, and so shall be stored elsewhere. Thus, at the repository level, copying is branching. At the client level, branches are conceptually distinguished, and so the terms "copying" and "branching" are naturally distinguished: copying makes a new element (or tree) that is not a branch, and branching makes a new branch in the same family as an existing branch. Normally a user would only ever branch a branch, and copy a non-branch. In practice it would be unusual to want to "copy" a branch so as to make a new non-branch element or tree, or to create a new branch as a copy of a non-branch, but these variants could be provided for advanced users. A suggested user interface would have - "branch" applied to a branch (root) creates a new branch in the same branch family; - "copy" applied to a non-branch creates a new non-branch tree as a copy of the source; - "branch" applied to a non-branch or "copy" to a branch is an error, except in advanced usage. An alternative user interface could have just one operation distinguished by the specified source: - "copy" applied to a branch (root) creates a new branch in the same branch family; - "copy" applied to a non-branch creates a new non-branch tree as a copy of the source. In any case, the new branch or non-branch shall be tagged with client-level metadata saying where it was copied from. LIFE-LINES and RESURRECTION The life-line of an element on a given branch is a subset of the life-line of its branch. At each revision in the life-line of its branch, the element can exist (at an arbitrary name and parent element within the branch) or not exist. An element can be resurrected: that is, after being deleted, it can be brought back into existence on the same branch, as the same element, and not just as a new element copied from it. This is essentially the same as what must happen when we merge the creation of this element to any other branch on which it is not currently alive, and so it is a natural part of branching and merging. (Resurrection is a necessary consequence of the fact that, for self-consistency, it must be possible to reverse-merge any change. Reverse-merging a deletion, whether onto the same branch or onto another branch, must resurrect an instance of the same element that was deleted, and not just create a copy of it.) As a branch root is an element of an outer branch, it follows that each branch itself can be deleted and resurrected. A branch family can therefore have zero or more of its branches in existence at any one time. IDs Stefan2 and Brane and I have ideas about the repository schema and IDs. One core point, to the best of my understanding so far, is that in the <node-id.copy-id.txn-id> scheme, the <node-id> shall identify an element uniquely across all elements of all branch families (but repeated for each branch within a family); and the <copy-id> shall identify the branch uniquely among all branches in all branch families in the repository. The <copy-id> shall no longer be used when the user wants a simple non-branching copy; instead, in that case a new node-id (or tree of node ids) shall be assigned. We'll write separately more about the repository schema and IDs. FWIW, I realize it's a very big development, but it no longer seems impossibly far off. The pieces are coming together. Comments? Help? - Julian