Hi there,

After two hours of analysis, it seems that I have found
the correct definition for the "node line ID" as required
by Julian's move API design:

NodeLineID(Path,Rev) := LastCopyTarget@LastCopyRev

where LastCopyRev is the latest copy that involved
Path@Rev (or any of its parents). LastCopyTarget is
the path to which path got copied to in LastCopyRev.
If a node has not been copied, LastCopyRev is the
revision that this node got added.

Note that possibly LastCopyTarget != Path in case
that moves in (LastCopyRev, Rev] changed the path
(or any of its parents).

Properties of NodeLineID as defined above:

* defined for any path@rev in the repository
* exactly one IDs for the same path@rev
* not defined for any path@rev that does not exist
  in the repository

* is unique for all paths at any one revision
* does change [exactly] when path or any of its parents
  gets copied and only for the copied tree
* does *not* change when path or any of its parents
  gets renamed / moved

I.e. LastCopyTarget@LastCopyRev assigns exactly one
ID to each line of non-copying node history in the repository.
All alternative definitions will should, therefore, be equivalent
to the one given here.

In particular, the following cannot simply be used:

* (nodeId, copyID) since there are usually fewer nodes
  in the DAG than there are paths @HEAD alone
  (approx. 1:2 for apache.org)
* replacing the current lazy copying with deep copies
  would create nodes exactly for all
  LastCopyTarget@LastCopyRev values, i.e. is equivalent
  but less space-efficient

* anything involving parent paths or IDs because a move
  to a different parent must not change the node line id.

The given definition even has nice computational properties:

* LastCopyTarget@LastCopyRev can be directly expressed
  as such in a developer readable string
* The code to find the LastCopyRev is part of the standard
  history following code and relatively efficient
* For all entries in the same directory@rev, the effort can be
  lowered to O(1) in many cases since the respective parents
  have already been investigated (depends on internal node-ID
  assignment rules).

-- Stefan^2.

Reply via email to