C. Michael Pilato wrote: > Julian Foad wrote: > > Update to <libsvn_fs_base/notes/structure> attached - any review > > comments before I commit? > > Gah. Attachment approach makes this hard to review...
Oh, sorry, I thought everyone was OK with 'text/x-patch' attachments. Thanks for having a look despite that. Pasting in-line this time. > I think there might be some parts that are incorrect. For example: > > - (HEADER PROP-KEY DATA-KEY [EDIT-DATA-KEY]) > + (HEADER PROP-KEY DATA-KEY [DATA-KEY-UNIQID] [EDIT-DATA-KEY]) > > If you look at the faux BNF stuff at the end, I think you'll find that we > didn't squeeze a new atom DATA-KEY-UNIQID into the skel -- we made DATA-KEY > be either an atom (for compat with old repositories) or a 2-tuple `(DATA-KEY > DATA-KEY-UNIQID)'. Heh, saw that, tried to write an informal equivalent for the purposes of this descriptive part of the document, as some other examples were informal (omitting square brackets that would denote optionality) but I don't fully grasp the syntax yet and I'll admit that was a misleading thing to do. I will write it formally, as: [[[ (HEADER PROP-KEY DATA-INFO [EDIT-DATA-KEY]) where DATA-INFO ::= DATA-KEY | (DATA-KEY DATA-KEY-UNIQID) and DATA-KEY identifies the representation for [...] ]]] I have also found and formalized two or three more instances where definitions were informally abbreviated. > So, before committing, you should double-check all your changes against that > BNF stuff. If there's any part of the structure document that is likely to > be up-to-date, it's that part. That's what I thought, so that's where I got much of my info from. New patch pasted in-line below. Thanks again. - Julian [[[ Update the BDB schema documentation to reflect changes that have been made between Subversion 1.0 and 1.6. * subversion/libsvn_fs_base/notes/structure Several updates and clarifications. --This line, and those below, will be ignored-- Index: subversion/libsvn_fs_base/notes/structure =================================================================== --- subversion/libsvn_fs_base/notes/structure (revision 880857) +++ subversion/libsvn_fs_base/notes/structure (working copy) @@ -98,13 +98,13 @@ To determine the actual type currently in use for the keys of a given table, you are invited to check out the "Appendix: Filesystem structure summary" section of this document. -NODE-REVISION and HEADER: how we represent a node revision +NODE-REVISION: how we represent a node revision We represent a given revision of a file or directory node using a list skel (see skel.h for an explanation of skels). A node revision skel has the form: (HEADER PROP-KEY KIND-SPECIFIC ...) @@ -113,13 +113,13 @@ PROP-KEY is the key of the representation that contains this node's properties list, and the KIND-SPECIFIC elements carry data dependent on what kind of node this is --- file, directory, etc. HEADER has the form: - (KIND CREATED-PATH PRED-ID PRED-COUNT) + (KIND CREATED-PATH [PRED-ID [PRED-COUNT [HAS-MERGEINFO MERGEINFO-COUNT]]]) where: * KIND indicates what sort of node this is. It must be one of the following: - "file", indicating that the node is a file (see FILE below). @@ -131,12 +131,15 @@ * PRED-ID, if present, indicates the node revision which is the immediate ancestor of this node. * PRED-COUNT, if present, indicates the number of predecessors the node revision has (recursively). + * HAS-MERGEINFO and MERGEINFO-COUNT, if present, indicate ... + ### TODO + Note that a node cannot change its kind from one revision to the next. A directory node is always a directory; a file node is always a file; etc. The fact that the node's kind is stored in each node revision, rather than in some revision-independent place, might suggest that it's possible for a node change kinds from revision to revision, but Subversion does not allow this. @@ -161,18 +164,25 @@ FILE: how files are represented. If a NODE-REVISION's header's KIND is "file", then the node-revision skel represents a file, and has the form: - (HEADER PROP-KEY DATA-KEY [EDIT-DATA-KEY]) + (HEADER PROP-KEY DATA-INFO [EDIT-DATA-KEY]) + +where + + DATA-INFO ::= DATA-KEY | (DATA-KEY DATA-KEY-UNIQID) -where DATA-KEY identifies the representation for the file's current +and DATA-KEY identifies the representation for the file's current contents, and EDIT-DATA-KEY identifies the representation currently available for receiving new contents for the file. +DATA-KEY-UNIQID ... +### TODO + See discussion of representations later. DIR: how directories are represented. @@ -254,37 +264,39 @@ and parse it appropriately. A representation has the form: (HEADER KIND-SPECIFIC) where HEADER is - (KIND TXN [CHECKSUM]) + (KIND TXN [MD5 [SHA1]]) The KIND is "fulltext" or "delta". TXN is the txn ID for the txn in -which this representation was created. CHECKSUM is a checksum of the +which this representation was created. MD5 is a checksum of the representation's contents, that is, what the representation produces, regardless of whether it is stored deltified or as fulltext. (For -compatibility with older versions of Subversion, CHECKSUM may be +compatibility with older versions of Subversion, MD5 may be absent, in which case the filesystem behaves as though the checksum is -there and is correct.) +there and is correct.) An additional kind of checksum, SHA1, is present +in newer formats, starting with version ... +### TODO The TXN also serves as a kind of mutability flag: if txn T tries to change a representation's contents, but the rep's TXN is not T, then something has gone horribly wrong and T should leave the rep alone (and probably error). Of course, "change a representation" here means changing what the rep's consumer sees. Switching a representation's storage strategy, for example from fulltext to deltified, wouldn't count as a change, since that wouldn't affect what the rep produces. KIND-SPECIFIC varies considerably depending on the kind of representation. Here are the two forms currently recognized: - (("fulltext" TXN CHECKSUM) KEY) - The data is at KEY in the `strings' table. + (("fulltext" TXN [MD5 [SHA1]]) STRING-KEY) + The data is at STRING-KEY in the `strings' table. - (("delta" TXN CHECKSUM) (OFFSET WINDOW) ...) + (("delta" TXN [MD5 [SHA1]]) (OFFSET WINDOW) ...) Each OFFSET indicates the point in the fulltext that this element reconstructs, and WINDOW says how to reconstruct it: WINDOW ::= (DIFF SIZE REP-KEY [REP-OFFSET]) ; DIFF ::= ("svndiff" VERSION STRING-KEY) @@ -477,13 +489,13 @@ where: * ROOT-ID is the node revision ID of the committed transaction's (or revision's) root node. - * REVISION represents the revision that was created when the + * REV represents the revision that was created when the transaction was committed. * PROPLIST is a skel giving the revision properties for the committed transaction. * COPIES contains a list of keys into the `copies' table, @@ -570,12 +582,18 @@ ("copy" SRC-PATH SRC-TXN DST-NODE-ID) ("soft-copy" SRC-PATH SRC-TXN DST-NODE-ID) where: + * "copy" indicates an explicitly requested copy, and "soft-copy" + indicates a node that was cloned internally as part of an + explicitly requested copy of some parent directory. See the + section "Copies and Copy IDs" in the file <fs-history> for + details. + * SRC-PATH and SRC-TXN are the canonicalized absolute path and transaction ID, respectively, of the source of the copy. * DST-NODE-ID represents the new node revision created as a result of the copy. @@ -590,23 +608,23 @@ Locks When a caller locks a file -- reserving an exclusive right to modify or delete it -- an lock object is created in a `locks' table. -The `locks' table is a btree whose key is a UUID string (also known as -a "lock-token"), and whose value is a skel representing a lock. The +The `locks' table is a btree whose key is a UUID string known as +a "lock-token", and whose value is a skel representing a lock. The fields in the skel mirror those of an svn_lock__t (see svn_types.h): - ("lock" PATH UUID OWNER COMMENT XML-P CREATION-DATE EXPIRATION-DATE) + ("lock" PATH TOKEN OWNER COMMENT XML-P CREATION-DATE EXPIRATION-DATE) where: * PATH is the absolute filesystem path reserved by the lock. - * UUID is the universally unique identifier of the lock, also known + * TOKEN is the universally unique identifier of the lock, known as the lock-token. This is the same as the row's key. * OWNER is the authenticated username that "owns" the lock. * COMMENT is a string describing the lock. It may be empty, or it might describe the rationale for locking. @@ -912,13 +930,13 @@ Copies: COPY ::= REAL-COPY | SOFT-COPY REAL-COPY ::= ("copy" SRC-PATH SRC-TXN DST-NODE-ID) SOFT-COPY ::= ("soft-copy" SRC-PATH SRC-TXN DST-NODE-ID) SRC-PATH ::= atom ; - SRC-REV ::= TXN ; + SRC-TXN ::= TXN ; DST-NODE-ID ::= NODE-REV-ID ; Entries lists: ENTRIES ::= (ENTRY ...) ; ]]]