On 29.12.2010 01:58, Johan Corveleyn wrote:
On Sun, Dec 12, 2010 at 4:23 PM, Stefan Fuhrmann
<stefanfuhrm...@alice-dsl.de>  wrote:
On 19.10.2010 15:10, Daniel Shahaf wrote:
Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:
Personally, I see [FSv2] as a broad swath of API changes to align our
needs with the underlying storage. Trowbridge noted that our current
API makes it *really* difficult to implement an effective backend. I'd
also like to see a backend that allows for parallel PUTs during the
commit process. Hyrum sees FSv2 as some kind of super-key-value
storage with layers on top, allowing for various types of high-scaling
mechanisms.
At the retreat, stefan2 also had some thoughts about this...

[This is just a brain-dump for 1.8+]

While working on the performance branch I made some
observations concerning the way FSFS organizes data
and how that could be changed for reduced I/O overhead.

notes/fsfs-improvements.txt contains a summary of that
could be done to improve FSFS before FS-NG. A later
FS-NG implementation should then still benefit from the
improvements.
+(number of fopen calls during a log operation)

I like this proposal a lot. As I already told before, we are running
our FSFS back-end on a SAN over NFS (and I suspect we're not the only
company doing this). In this environment, the server-side I/O of SVN
(especially the amount of random reads and fopen calls during e.g.
log) is often the major bottleneck.

There is one question going around in my head though: won't you have
to change/rearrange a lot of the FS layer code (and maybe repos
layer?) to benefit from this new format?
Maybe. But as far as I understand the current
FSFS structure, data access is mainly chasing
pointers, i.e. reading relative or absolute byte
offsets and moving there for the next piece of
information. If everything goes well, none of that
code needs to change; the revision packing
algorithm will simply produce different offset
values.
The current code is written in a certain way, not particularly
optimized for this new format (I seem to remember "log" does around 10
fopen calls for every interesting rev file, each time reading a
different part of it). Also, if an operation currently needs to access
many revisions (like log or blame), it doesn't take advantage at all
of the fact that they might be in a single packed rev file. The pack
file is opened and seeked in just as much as the sum of the individual
rev files.
The fopen() calls should be eliminated by the
file handle cache. IOW, they should already be
addressed on the performance branch. Please
let me know if that is not the case.

FSFS format 6 would primarily reduce the number
of seek() and read() calls. Once the seeks() are
"in check", the size of the read buffer might become
configurable: remote file access might benefit from
larger buffers, e.g. equal to the network throughput
per 1 .. 10 ms.
So: how will the current code be able to take advantage of this new
format? Won't this require a major effort to restructure that code?
Old servers won't be able to read format 6 repos
(maybe they will but there is no guarantee). If a
large scale restructuring of the code would be
necessary, I may not be able to do and validate it.

The packing code, however, will probably be
completely rewritten.
(This reminds me of the current difficulty (as I can see it, as an
innocent bystander) with the WC-NG rewrite: theoretically it should be
very fast, but the "higher level" code is still largely based upon the
old principles. So to take advantage of it, certain things have to be
changed at the higher level, making operations work "dir-based" or
"tree-based", instead of file-based etc).
Well, the official goal is still to make 1.7 clients
faster than 1.6 for every operation. But there will
certainly be room for improvement in 1.8.

-- Stefan^2.

Reply via email to