On Sun, Dec 12, 2010 at 9:23 AM,  <stef...@apache.org> wrote:
> Author: stefan2
> Date: Sun Dec 12 15:23:18 2010
> New Revision: 1044833
>
> URL: http://svn.apache.org/viewvc?rev=1044833&view=rev
> Log:
> Get some ideas for FSFS format improvements written down.
>
> * notes/fsfs-improvements.txt: new file
>
> Added:
>    subversion/trunk/notes/fsfs-improvements.txt   (with props)
>
> Added: subversion/trunk/notes/fsfs-improvements.txt
> URL: 
> http://svn.apache.org/viewvc/subversion/trunk/notes/fsfs-improvements.txt?rev=1044833&view=auto
> ==============================================================================
> --- subversion/trunk/notes/fsfs-improvements.txt (added)
> +++ subversion/trunk/notes/fsfs-improvements.txt Sun Dec 12 15:23:18 2010
> @@ -0,0 +1,224 @@
> +Introduction
> +------------
> +
> +The FSFS external data format can be changed to allow for
> +significantly reduced I/O overhead without changing the
> +fundamental ideas behind FSFS.
> +
> +The whole idea is to rearrange packed revision info in a
> +new FSFS format "6".  A two-way conversion between format
> +"5" and "6" should be possible as well as supporting both
> +formats for different repositories on the same server.

Generally, I like these ideas.

As I understand it, they would be implemented as part of the packing
process, or the pack files would be rewritten as part of the repos
format upgrade.  This could get a bit tricky, since FSFS depends on
static file offsets for referencing data within revision files.  Not
impossible, just tricky.

> +
> +
> +Revision Order
> +--------------
> +
> +FSFS format "5" packs revisions by putting revision 0 at
> +the beginning of the file and concatenating the others in
> +ascending order.  This intuitive design is counter-productive
> +because we always trace data HEAD -> r0 and scanning a file
> +backwards is expensive.
> +
> ++-------+
> +| rev 0 |
> ++-------+
> +| rev 1 |
> ++-------+
> +
> +:  ...  :
> +
> ++-------+
> +| rev N |
> ++-------+
> +<EOF>
> +
> +A counter-intuitive but more efficient reversed order (de-
> +scending revision order in a packed file) results in always
> +reading forward through a file.
> +
> ++-------+
> +| rev N |
> ++-------+
> +
> +:  ...  :
> +
> ++-------+
> +| rev 1 |
> ++-------+
> +| rev 0 |
> ++-------+
> +<EOF>
> +
> +
> +Don't Interleave Revision and Content Data
> +------------------------------------------
> +
> +Format "5" keeps the each revision as a single block.  When
> +re-constructing a node content from the diffs,  we jump
> +through a number of distant revision blocks.  For each node.
> +
> ++--------------+
> +| rev N Reps   |
> ++--------------+
> +| rev N Header |
> ++--------------+
> +
> +:     ...      :
> +
> ++--------------+
> +| rev 1 Reps   |
> ++--------------+
> +| rev 1 Header |
> ++--------------+
> +| rev 0 Reps   |
> ++--------------+
> +| rev 0 Header |
> ++--------------+
> +<EOF>
> +
> +Instead,  delta-info should be removed from the revision
> +blocks and put at the end of the file.  This speeds up
> +header-only operations like "log" without impairing node
> +content lookup.  And it lays the foundation for further
> +improvements.
> +
> ++--------------+
> +| rev N Header |
> ++--------------+
> +
> +:     ...      :
> +
> ++--------------+
> +| rev 1 Header |
> ++--------------+
> +| rev 0 Header |
> ++--------------+
> +| rev N Reps   |
> ++--------------+
> +
> +:     ...      :
> +
> ++--------------+
> +| rev 1 Reps   |
> ++--------------+
> +| rev 0 Reps   |
> ++--------------+
> +<EOF>
> +
> +The only downside is that putting headers first is somewhat
> +more computationally expensive since offsets needs to be
> +calculated in advance.  This is made even more difficult by
> +using a variable length encoding for those offsets.

Does the statement about computation cost apply to the initial placement,
or subsequent accesses?  (I would guess the former.)

> +
> +
> +Group the Representations by Delta Tree
> +---------------------------------------
> +
> +All diffs that are based on the same node within the packed
> +revision file are put in one place.  Re-constructing a node
> +would then involve reading the revision headers (relatively
> +close together) and the content deltas after that (again
> +with high locality).
> +
> ++------------------+
> +| rev N Header     |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| rev 1 Header     |
> ++------------------+
> +| rev 0 Header     |
> ++------------------+
> +| node tree A Reps |
> ++------------------+
> +| node tree B Reps |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| node tree Z Reps |
> ++------------------+
> +<EOF>
> +
> +We can optimize that even further by exploiting the skip-
> +delta information: the "walks" through the delta tree for a
> +given node will merge bit by bit into a main "trunk". The
> +nodes on these paths can be re-ordered such that very few
> +seek() operations are required on average to reconstruct
> +the node content in this file.
> +
> +Example of 8 changes in revs r0 .. r7 (for simplicity),
> +looking at the delta-info only ("->" means "stored as diff
> +against"):
> +
> +       r0
> +       r1->r0
> +       r2->r0
> +       r3->r2
> +       r4->r0
> +       r5->r4
> +       r6->r4
> +       r7->r6
> +
> +Default ordering r7,r6,r5,r4,r3,r2,r1,r0 requires
> +3/3/2/2/2/2/1/1 seeks.  For 2^N changes (N integer > 0),
> +we need (N+1)/2 seeks on average.
> +
> +A path-optimized ordering r1,r3,r2,r5,r7,r6,r4,r0
> +requires 2/2/2/2/1/1/1/1 seeks.  It's (N+3)/4 on average.
> +That optimized ordering can easily be constructed by
> +
> +       * select the highest rev not covered, yet
> +       * prepend its diff path to the existing result
> +         (until we meet a revision at already is in
> +         in the result)
> +       * repeat until all revs are covered
> +
> +
> +Move the Manifest Info into the Packed File
> +-------------------------------------------
> +
> +Once we mastered the offset pre-calculation issue (see
> +above),  we may as well put the manifest info at the be-
> +ginning of the file.  This will benefit "log" in particular
> +as only one consecutive chunk of data needs to be read
> +from only one file.

I don't completely understand the benefits we get from this, since the
manifest is read one and then cached.  I would think that the accesses
into the pack file would dominate I/O, and the placement of the
manifest wouldn't matter much (though I could certainly be wrong).

> +
> ++------------------+
> +| rev 0 Offset     |
> ++------------------+
> +| rev 1 Offset     |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| rev N Offset     |
> ++------------------+
> +| rev N Header     |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| rev 1 Header     |
> ++------------------+
> +| rev 0 Header     |
> ++------------------+
> +| node tree A Reps |
> ++------------------+
> +| node tree B Reps |
> ++------------------+
> +
> +:       ...        :
> +
> ++------------------+
> +| node tree Z Reps |
> ++------------------+
> +<EOF>
> +
> +
>
> Propchange: subversion/trunk/notes/fsfs-improvements.txt
> ------------------------------------------------------------------------------
>    svn:eol-style = native
>
>
>

Overall, it looks like a good plan.  Thanks!

-Hyrum

Reply via email to