Re: NODE_DATA (2nd iteration)

Julian Foad Tue, 03 Aug 2010 06:19:11 -0700

On Mon, 2010-07-12, Erik Huelsmann wrote:
> After lots of discussion regarding the way NODE_DATA/4th tree should
> be working, I'm now ready to post a summary of the progress. In my
> last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I
> stated why we need this; this post is about the conclusion of what
> needs to happen. Also included are the first steps there.
> 
> 
> With the advent of NODE_DATA, we distinguish node values specifically
> related to BASE nodes, those specifically related to "current" WORKING
> nodes and those which are to be maintained for multiple levels of
> WORKING nodes (not only the "current" view) (the latter category is
> most often also shared with BASE).
> 
> The respective tables will hold the columns shown below.
> 
> 
> -------------------------
> TABLE WORKING_NODE (
>   wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
>   local_relpath  TEXT NOT NULL,
>   parent_relpath  TEXT,
>   moved_here  INTEGER,
>   moved_to  TEXT,
>   original_repos_id  INTEGER REFERENCES REPOSITORY (id),
>   original_repos_path  TEXT,
>   original_revnum  INTEGER,
>   translated_size  INTEGER,
>   last_mod_time  INTEGER,  /* an APR date/time (usec since 1970) */
>   keep_local  INTEGER,
> 
>   PRIMARY KEY (wc_id, local_relpath)
>   );
> 
> CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath);
> --------------------------------
> 
> The moved_* and original_* columns are typical examples of "WORKING
> fields only maintained for the visible WORKING nodes": the original_*
> and moved_* fields are inherited from the operation root by all
> children part of the operation. The operation root will be the visible
> change on its own level, meaning it'll have rows both in the
> WORKING_NODE and NODE_DATA tables. The fact that these columns are not
> in the WORKING_NODE table means that tree changes are not preserved
> accros overlapping changes. This is fully compatible with what we do
> today: changes to higher levels destroy changes to lower levels.
> 
> The translated_size and last_mod_time columns exist in WORKING_NODE
> and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that
> they exist in BASE_NODE is a bit of a hack: it's to prevent creation
> of WORKING_NODE data for every file which has keyword expansion or eol
> translation properties set: these columns serve only to optimize
> working copy scanning for changes and as such only relate to the
> visible WORKING_NODEs.
>


Can we come up with an English description of what each table will now
represent?

"The BASE_NODE table lists the existing node-revs in the repository that
comprise the mixed-revision tree that was most recently updated/switched
to or checked out.  (The kind and content of these nodes is not here;
see the NODE_DATA table.)"

>  TABLE BASE_NODE (
>   wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
>   local_relpath  TEXT NOT NULL,
>   repos_id  INTEGER REFERENCES REPOSITORY (id),
>   repos_relpath  TEXT,

We need a revision number column here to go along with repos_id and
relpath to make a valid node-rev reference, don't we?

>   parent_relpath  TEXT,

(While we're reorganising, can we move that "parent_relpath" column to
adjacent to "local_relpath"?)

>   translated_size  INTEGER,
>   last_mod_time  INTEGER,  /* an APR date/time (usec since 1970) */
>   dav_cache  BLOB,
>   incomplete_children  INTEGER,
>   file_external  TEXT,
> 
>   PRIMARY KEY (wc_id, local_relpath)
>   );
> 

"The NODE_DATA table records the kind and shallow content (props, text,
link target) of each node in the WC.  It includes both the nodes that
comprise the currently 'visible' (or 'actual' or 'on-disk') state of the
WC and also all nodes that are part of a copied or moved tree but
currently shadowed by a replacement performed inside that tree.

At least one row exists for each WC path, including paths with no change
and all paths affected by a tree change (add, delete, etc.).  If the
same path is affected by multiple levels of tree change - a replacement
inside a copied directory, for example - then multiple rows exist with
different 'op_depth' values."

> TABLE NODE_DATA (
>   wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
>   local_relpath  TEXT NOT NULL,
>   op_depth  INTEGER NOT NULL,
>   presence  TEXT NOT NULL,
>   kind  TEXT NOT NULL,
>   checksum  TEXT,
>   changed_rev  INTEGER,
>   changed_date  INTEGER,  /* an APR date/time (usec since 1970) */
>   changed_author  TEXT,

The changed_* columns can only belong to a node-rev that exists in the
repository.  What node-rev do they belong to and why aren't they
alongside the node-rev details?

(The changed_* columns convey essentially a rev number and two of the
rev-props associated with that revnum that can be used in keyword
expansions.  We should consider representing that information in a more
general form, both to avoid tying the DB format to the choice of those
two particular revprops, and to avoid the redundancy of storing these
same data and author values N times.)


>   depth  TEXT,
>   symlink_target  TEXT,
>   properties  BLOB,

(While we're rearranging, can we group the node-content fields together:
kind, properties, checksum, symlink_target?)

>   PRIMARY KEY (wc_id, local_relpath, oproot)

s/oproot/op_depth/?

>   );
> 
> CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath);
> 
> 
> Which leaves the NODE_DATA structure above. The op_depth column
> contains the depth of the node - relative to the wc root - on which
> the operation was run which caused the creation of the given NODE_DATA
> node.  In the final scheme (based on single-db), the value will be 0
> for base and a positive integer for WORKING related data.

Let's assume single-db.  By the last sentence, I understand: For each
BASE_NODE row there is a corresponding NODE_DATA row with 'op_root' = 0;
for every node brought in by a tree operation (copy, move, add) to an
immediate child of the WC root there is a NODE_DATA row with 'op_root' =
1; for every child of a child ... 2; and so on.


- Julian


> In order to be able to implement NODE_DATA even without having a fully
> functional SINGLE_DB yet, a transitional node numbering scheme needs
> to be devised. The following numbers will apply: BASE == 0,
> WORKING-this-dir == 1, WORKING-any-immediate-child == 2.
> 
> 
> Other transitioning related remarks:
> 
>  * Conditional-protected experimentational sections, just like with SINGLE_DB
>  * Initial implementation will simply replace the current
> functionality of the 2 tables, from there we can work our way through
> whatever needs doing.
>  * Am I forgetting any others?
> 
> Bye,
> 
> Erik.

Re: NODE_DATA (2nd iteration)

Reply via email to