[ Just changing the subject to get better visibility, so maybe more people will read this and do some experiments with fsfs-reorg (not with production data of course), to check the impact on "cold I/O". Thanks for the explanation, Stefan. ]
On Sat, Oct 13, 2012 at 7:06 PM, Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> wrote: > On Thu, Oct 11, 2012 at 1:32 AM, Johan Corveleyn <jcor...@gmail.com> wrote: >> >> On Wed, Oct 10, 2012 at 7:09 PM, Stefan Fuhrmann >> <stefan.fuhrm...@wandisco.com> wrote: ... >> > BTW, that code is not supposed to be *ever* >> > used for production data. >> >> Ok, good to know. I just executed the tool and saw the prominent >> warning, so that's pretty clear. > > > What I'm trying to say goes even beyond that. > This tool will (probably) never evolve into something > that would be used outside our dev community. > >> >> [ ... ] >> >> > Would be nice if people could use it to test / >> > evaluate the results. The hole idea is to verify >> > the method before attempting significant changes >> > to the FSFS layer in 1.9. >> >> Can you summarize a bit (maybe you explained it already in some notes >> file, but I don't quite remember) what it does again? What's the goal >> really? Is it about reshuffling the data inside the pack files to be >> more I/O efficient, while maintaining compatibility with existing >> servers (so a reorg'ed repository can be read by any 1.x server)? If >> so, how does it do that actually? > > > SVN 1.8 will have 100% cache coverage in the sense > that except for the format, fsfs.conf and friends, you > can serve all r/o requests from the cache once that > got populated. > > The next logical step is to reduce the amount of I/O > (physical seeks as well as data transfer). The basic > idea is layed out the fsfs-improvements notes but the > tool implementation goes a bit beyond that: > > * "overlay" revisions within a pack file, i.e. the offset > ranges overlap in the physical file > * put all the "changes" lists at the begin of the pack file > (used for log only) > * starting at /@HEAD, add node-rev, followed by reps > (in delta-order). Once a node is complete, continue > with its youngest sub-node until the tree is complete > * Continue with the youngest element not covered. > > The output should be compatible with SVN 1.6+ > (if the input was). Older formats are not supported - > for simplicity. > > As a result, many related rep deltas should sit next to > each other. Also, elements relevant for newer nodes > should be at the beginning of the file and older ones > tend to be moved to the end. Finally, we keep nodes > that are next to each other in the tree close to one > another in the resulting pack file. > > For the ASF repo, I've got a ~3 times speedup for > a "cold" checkout of SVN trunk (repo on an USB disk). > > But I may change / refine the placement stragegy > to e.g. put all props with mergeinfo in one place. > >> And, if we're thinking about evaluating the results: what should one >> focus on? Any particular use cases that should get a significant >> positive effect? Any use cases that might possibly be negatively >> affected? > > > There are two main points of interest for me: > > * does the conversion work or is it missing something > for your repo? > * does "cold" I/O go down? By how much and for > which operations? > > I found that using an USB disk to store the repo is > actually pretty neat because you can simply unplug > it and the OS will discard all cached data. > > > -- Stefan^2. > > -- > > Join us this October at Subversion Live 2012 for two days of best practice > SVN training, networking, live demos, committer meet and greet, and more! > Space is limited, so get signed up today! > > -- Johan