No comments on the content, but reading the ensuing email thread, it may be useful to put the document in notes/wc-ng/pristines, and add questions / comments / corrections there. It would allow folks down the road to see all the critiques in one location, rather than reading N mails. </bike-shed>
-Hyrum On Tue, Feb 15, 2011 at 3:06 PM, Julian Foad <julian.f...@wandisco.com> wrote: > Would anyone be able to review this spec please? I'm trying to get > straight what locking / access control rules need to be. > > /* > * THE PRISTINE STORE > * ================== > * > * === Introduction === > * > * The Pristine Store is the part of the Working Copy metadata that holds > * a local copy of the full text of the base version of each WC file. > * > * Texts in the Pristine Store are addressed only by their SHA-1 checksum. > * The Pristine Store does not track which text relates to which repository > * and revision and path. The Pristine Store does not hold pristine copies > * of directories, nor of properties. > * > * The Pristine Store data is held in > * * the 'PRISTINE' table in the SQLite Data Base (SDB), and > * * the files in the 'pristine' directory. > * > * This specification uses SDB transactions to ensure the consistency of > * writes and reads. > * > * ==== Invariants ==== > * > * The operating procedures below maintain the following invariants. > * These invariants apply at all times except within the SDB txns defined > * below. > * > * * Each row in the PRISTINE table has an associated pristine text file > * that is not open for writing and is available for reading and whose > * content matches the columns 'size', 'checksum', 'md5_checksum'. > * > * ==== Operating Procedures ==== > * > * The steps should be carried out in the order specified. (See rationale.) > * > * * To add a pristine, do the following inside an SDB txn: > * * Add the table row, and set the refcount as desired. If a row > * already exists, add the desired refcount to its refcount, and > * preferably verify the old row matches the new metadata. > * * Create the file. Creation should be fs-atomic, e.g. by moving a > * new file into place, so as never to orphan a partial file. If a > * file already exists, preferably leave it rather than replace it, > * and optionally verify it matches the new metadata (e.g. length). > * > * * To remove a pristine, do the following inside an SDB txn: > * * First, check refcount == 0, and abort if not. > * * Delete the table row. > * * Delete the file or move it away. (If not present, log a > * consistency error but, in a release build, return success.) > * > * * To query a pristine's existence or SDB metadata, the reader must: > * * Ensure no pristine-remove txn is in progress while querying it. > * > * * To read a pristine text, the reader must: > * * Ensure no pristine-remove txn is in progress while querying and > * opening it. > * * Ensure the pristine text remains in the store continuously from > * opening it for the duration of the read. (Perhaps by ensuring > * refcount remains >= 1 and/or by cooperating with the clean-up > * code.) > * > * ==== Rationale ==== > * > * * Adding a pristine: > * * We can't add the file *before* the SDB txn takes out a lock, > * because that would leave a gap in which another process could > * see this file as an orphan and delete it. > * * Within the txn, the table row could be added after creating the > * file; it makes no difference as it will not become externally > * visible until commit. But then we would have to take out a lock > * explicitly before adding the file. Adding the row takes out a > * lock implicitly, so doing it first avoids an extra step. > * * Leaving an existing file in place is less likely to interfere with > * processes that are currently reading from the file. Replacing it > * might also be acceptable, but that would need further > * investigation. > * > * * Removing a pristine: > * * We can't remove the file *after* the SDB txn that updates the > * table, because that would leave a gap in which another process > * might re-add this same pristine file and then we would delete it. > * * Within the txn, the table row could be removed after creating the > * file, but see the rationale for adding a pristine. > * * In a typical use case for removing a pristine text, the caller > * would check the refcount before starting this txn, but > * nevertheless it may have changed and so must be checked again > * inside the txn. > * > * * In the add and remove txns, we need to acquire an SDB 'RESERVED' > * lock before adding or removing the file. This can be done by starting > * the txn with 'BEGIN IMMEDIATE' and/or by performing an SDB write (such > * as the table row update). ### Would a 'SHARED' lock be sufficient, > * and if so would it be noticably better? > * > * ==== Notes ==== > * > * * This procedure can leave orphaned pristine files (files without a > * corresponding SDB row) if Subvsersion crashes. The Pristine Store > * will still operate correctly. It should be easy to teach "svn cleanup" > * to safely delete these. ### Do we need to define the clean-up > * procedure here? > * > * * This specification is conceptually simple, but requires completing disk > * operations within SDB transactions, which may make it too inefficient > * in practice. An alternative specification could use the Work Queue to > * enable more efficient processing of multiple transactions. > * > * > * REFERENCE COUNTING > * ================== > * > * The Pristine Store spec above defines how texts are added and removed > * from the store. This spec defines how the addition and removal of > * pristine text references within the WC DB are co-ordinated with the > * addition and removal of the pristine texts themselves. > * > * One requirement is to allow a pristine text to be stored some > * time before the reference to it is written into the NODES table. The > * 'commit' code path, for example, needs to store a file's new pristine > * text somewhere (and the pristine store is an obvious option) and then, > * when the commit succeeds, update the WC to reference it. > * > * Store-then-reference could be achieved by: > * > * (a) Store text outside Pristine Store. When commit succeeds, add it > * to the Pristine Store and reference it in the WC; if commit > * fails, remove the temporary text. > * (b) Store text in Pristine Store with initial ref count = 0. When > * commit succeeds, add the reference and update the ref count; if > * commit fails, optionally try to purge this pristine text. > * (c) Store text in Pristine Store with initial ref count = 1. When > * commit succeeds, add the reference; if commit fails, decrement > * the ref count and optionally try to purge it. > * > * Method (a) would require, in effect, implementing an ad-hoc temporary > * Pristine Store, which seems needless duplication of effort. It would > * also require changing the way the commit code path passes information > * around, which might be no bad thing in the long term, but the result > * would not appear to have any advantage over method (b). > * > * Method (b) plays well with automatically maintaining the ref counts > * equal to the number of in-SDB references, at the granularity of SDB > * txns. It requires an interlock between adding/deleting references and > * purging unreferenced pristines - e.g. guard each of these operations by > * a WC lock. > * * Add a pristine & reference it => any WC lock > * (To prevent purging it while adding.) > * * Unreference a pristine => no lock needed. > * * Unreference a pristine & purge-if-0 => Same as doing these separately. > * * Purge any/all refcount==0 pristines => an exclusive WC lock. > * (To prevent adding a ref while purging.) > * * If a WC lock remains after a crash, then purge refcount==0 pristines. > * > * Method (c): > * * ### Not sure about this one - haven't thought it through in detail... > * * Add a pristine & reference in separate steps => any WC lock (?) > * * Remove a reference requires ... (nothing more?) > * * Find & purge unreferenced pristines requires an exclusive WC lock. > * * Ref counts are sometimes too high while a WC lock is held, so > * uncertain after a crash if WC locks remain, so need to be re-counted > * during clean-up. > * > * We choose method (b). > * > * > * === Invariants in a Valid WC DB State === > * > * * No pristine text, even if refcount == 0, will be deleted from the store > * as long as any process holds any WC lock in this WC. > * > * The following conditions are always true outside of a SQL txn: > * > * * The 'checksum' column in each NODES table row is either NULL or > * references a primary key in the 'pristine' table. > * > * * The 'refcount' column in each PRISTINE table row is equal to the > * number of NODES table rows whose 'checksum' column references this > * pristine row. > * > * The following conditions are always true > * outside of a SQL txn, > * when the Work Queue is empty: > * (### ?) when no WC locks are held by any process: > * > * * The 'refcount' column in a PRISTINE table row equals the number of > * NODES table rows whose 'checksum' column references that pristine row. > * It may be zero. > * > * ==== Operating Procedures ==== > * > * The steps should be carried out in the order specified. > * > * * To add a pristine text reference to the WC, obtain the text and its > * checksum, and then do this while holding a WC lock: > * * Add the pristine text to the Pristine Store, setting the desired > * refcount >= 1. > * * Add the reference(s) in the NODES table. > * > * * To remove a pristine text reference from the WC, do this while holding > * a WC lock: > * * Remove the reference(s) in the NODES table. > * * Decrement the pristine text's 'refcount' column. > * > * * To purge an unreferenced pristine text, do this with an *exclusive* > * WC lock: > * * Check refcount == 0; skip if not. > * * Remove it from the pristine store. > */ > > > - Julian > > >