On Thu, Sep 08, 2011 at 01:08:33AM -0000, danie...@apache.org wrote: > Author: danielsh > Date: Thu Sep 8 01:08:33 2011 > New Revision: 1166489 > > URL: http://svn.apache.org/viewvc?rev=1166489&view=rev > Log: > On the fs-successor-ids branch, actually implement sharding. > > Found by: stsp > > * subversion/libsvn_fs_fs/fs_fs.c > (FSFS_SUCCESSORS_REVISIONS_PER_SHARD): New helper. > (path_successor_ids_shard, path_successor_ids, > path_successor_node_revs_shard, path_successor_node_revs): > Fix path calculations. > (update_successor_ids_file): > Fix checks for 'New shard' and 'New file in a shard'. >
Just to make sure we both have the same idea: Each file in the successor store is responsible for a fixed number of revisions (currently 1000). max-files-per-dir tells us how many files can be in a single directory. If more than max-files-per-dir files exist in a given directory we open a new directory and store files there instead. So I would expect sharding within the successors tree to behave like this: filename: file stores successor data created in: db/successors/ids/0/0 r0..r999 db/successors/ids/0/1 r1000..r1999 ... ... db/successors/ids/0/999 r1000000..r1999999 db/successors/ids/1/0 r2000000..r2000999 ... ... Data for the first million revs goes into the first shard, data for the second million revs goes into the second shard, etc. Is this what you've implemented? I probably would have used FSFS_SUCCESSORS_FILES_PER_SHARD instead of FSFS_SUCCESSORS_REVISIONS_PER_SHARD, and then computed the filename based on that number. I don't like thinking of it in terms of "revisions per shard" because the numbers get so big :)