NODE_DATA (aka fourth tree)

Erik Huelsmann Sat, 10 Jul 2010 14:56:04 -0700

As announced by gstein before, we've had some discussion on the
NODE_DATA structure which should allow storing multiple levels of tree
manipulation in our wc-db. This mail aims at describing my progress on
the subject so far. Please review and comment.



Introduction
----------------

What's the 4th tree about? The 4th tree is not 1 tree, but instead
it's the ability to store overlapping tree changes in our WORKING
tree. Take the following tree:

root
 +- A - C - file
 \- B - C - file

Then, imagine replacing A with B. All would be fine with our current
single level WORKING representation. However, if we replace 'file' in
the copied tree, a single level won't do anymore: if you revert the
replacement of file, you want to revert to what was there when the
tree was copied. The other option - which you don't want because it
would result in an inconsistent tree - would be that wc-ng would
revert to what was there even before the copy operation.

Being able to revert the 'file' replacement independently of the 'A'
replacement, you need 2 levels of WORKING nodes for 'file': one for
the direct replacement and one for the replacement that comes with
replacing 'A'. Using the same logic, many levels may be required to
model complicated working copy changes.


What this change is not
----------------------------------

This change does not include any change to the current behaviour of
libsvn_wc that modifying modified trees are destructive operations.
The multi-level model exists only to keep track of WORKING tree
changes, not to make changes to the ACTUAL tree visible again after
reverting a replaced subtree.



Proposed change
-------------------------

Greg made a proposal on the list some time ago which allows the
required multiplicity of WORKING nodes by creating a new table:
NODE_DATA. The table was proposed to hold a subset of the columns
currently in the BASE_NODE and WORKING_NODE tables.

The rationale about storing the BASE_NODE data in the table too is
that a query for a node which doesn't have a WORKING version will
simply return the BASE version. That way, there's no need to teach the
code about the absense of WORKING. Although the BASE_NODE information
is put in this table, this doesn't mean the BASE_NODE and WORKING_NODE
concepts are being redefined, other than allowing layered WORKING_NODE
(sub)trees.


Columns to be placed in NODE_DATA:

 * wc_id
 * local_relpath
 * oproot_distance
 * presence
 * kind
 * revnum
 * checksum
 * translated_size
 * last_mod_time
 * changed_rev
 * changed_date
 * changed_author
 * depth
 * properties
 * dav_cache
 * symlink_target
 * file_external

This means, these columns stay in WORKING_NODE (next to its key, ofcourse):

 * copyfrom_repos_id
 * copyfrom_repos_path
 * copyfrom_revnum
 * moved_here
 * moved_to

These columns can stay in WORKING_NODE, because all children inherit
their values from the oproot. I.e. a subdirectory of a copied
directory inherits the copy/move info, unless it's been copied/moved
itself, in which case it has its own copy information.


As described before, sorting the nodes relating to a certain path in
ascending order relating to their oproot, you'd always get the
'current' WORKING state applicable to the node, if the distance
between the node and the working copy root is used to identify the
BASE_NODE data.


Most -if not all- of the changes to the underlying table structure
should stay hidden behind the wc-db API.



Relevance to 1.7
----------------------

Why do we need this change now? Why can't it wait until we finished
1.7, after all, it's just polishing the way we versioned directories
in wc-1, right?

Not exactly. Currently, mixed-revision working copies are modelled
using an oproot for each subtree with its own revision number. That
means that without this change, effectively we can't represent
mixed-revision working copy trees. So, in order to achieve feature
parity with 1.6, we need to realise this change before 1.7.



Well, that's basically it. Comments?


Bye,


Erik.

NODE_DATA (aka fourth tree)

Reply via email to