Re: FSFS format7 status and first results

Stefan Fuhrmann Thu, 21 Feb 2013 02:06:28 -0800

On Mon, Feb 18, 2013 at 5:54 PM, Mark Phippard <markp...@gmail.com> wrote:

> On Sat, Feb 16, 2013 at 4:30 PM, Stefan Fuhrmann
> <stefan.fuhrm...@wandisco.com> wrote:
> > On Sat, Feb 16, 2013 at 5:47 PM, Mark Phippard <markp...@gmail.com>
> wrote:
> >>
> >> On Sat, Feb 16, 2013 at 4:52 AM, Stefan Fuhrmann
> >> <stefan.fuhrm...@wandisco.com> wrote:
> >> > Hey all,
> >> >
> >> > Just to give you an update on what is going on that branch,
> >> > here a few facts and numbers.  Bottom line is that there is
> >> > still a lot to do but the basic assumptions proved correct and
> >> > significant benefits can already be demonstrated.
> >> >
> >> > * about 20% of the coding is done so far
> >> > * some core features implemented:
> >> >   logical addressing, reorg upon pack, block read
> >>
> >> What do you mean by pack here?  Is it svnadmin pack?
> >
> >
> > svnadmin pack
> >
> >>
> >> Is that in any way an essential part of the performance boost?
> >
> >
> > Yes. It will places items (noderevs, representations, change lists)
> > next to each other when they will likely be requested shortly
> > after one another. For instance, try to concatenate all elements
> > of a deltification chain.
> >
> >>
> >> Or are your format7 repositories always packed?
> >
> >
> > They are not. Unpacked revisions will see a performance hit from
> > reading the two extra index files per revision and a boost from
> > block-read which will often fetch the whole revision with a single
> > I/O operation.
>
> So is the main difference between format 6 and 7 how the data is
> organized when they are packed?
>

Currently, yes. Plus the ability to read data from
an arbitrary data block: for every position within
a rev / pack file, we now know what data that is
an can read it directly without DAG traversal etc.

Thus, we now try to hit any block in a RAID system
only once. However, there are limitations to our
caching heuristics that will make this hard to achieve
in some scenarios. Further work will address this in
two ways: improve short-term caching hit rates to
quasi 100% and reduce the number of items to cache.

The latter requires further changes to the on-disk
representation of data: We need to bundle them into
larger blocks ("containers"). As a nice side-effect,
we will safe another 30 .. 50% of disk space.

> > Quite a number of reasons:
> >
> > * easy setup
> > * minimal overhead (I want to get as close to measuring pure
> >   FS layer performance as possible)
> > * easy to debug and profile
>
> I get that for development purposes, but I would personally like to
> see that the caching etc. is yielding benefits when HTTP is used.
>

Apache should only add constant overhead, i.e. the
absolute savings should be roughly the same. Once
the cache-server branch is finished, the difference
in cache efficiency & effect between svnserve and
Apache should be gone.

>
> > '--enable-optimize' is new in 1.8. It should probably be documented
> > somewhere but I'm not sure how safe it is to *recommend* it to
> > packagers. The optimizations are quite aggressive and might break
> > unclean code.
> >
> > I used it in conjunction with '-march=native' to minimize CPU time
> > vs. I/O time. It saved 3 .. 5% of CPU cycles in my tests.
>
> OK.
>
> BTW, how are you managing your branch?  I tried merging it back to
> trunk to get an idea on the diff and there were a lot of text and tree
> conflicts.  I thought I had seen you doing synch merges from trunk in
> the past so I assumed it would merge back.
>

Hm. I split fsfs.c and .h into multiple files on the
branch and the first merge from /trunk required
significant manual intervention. Since that, merges
have been clean because fsfs.* wasn't touched
on /trunk.

If I understand Julian's merge changes in 1.8,
reintegrating should work because there has been
no cherry picking etc.

-- Stefan^2

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*

Re: FSFS format7 status and first results

Reply via email to