On Mon, Feb 18, 2013 at 5:54 PM, Mark Phippard <markp...@gmail.com> wrote:
> On Sat, Feb 16, 2013 at 4:30 PM, Stefan Fuhrmann > <stefan.fuhrm...@wandisco.com> wrote: > > On Sat, Feb 16, 2013 at 5:47 PM, Mark Phippard <markp...@gmail.com> > wrote: > >> > >> On Sat, Feb 16, 2013 at 4:52 AM, Stefan Fuhrmann > >> <stefan.fuhrm...@wandisco.com> wrote: > >> > Hey all, > >> > > >> > Just to give you an update on what is going on that branch, > >> > here a few facts and numbers. Bottom line is that there is > >> > still a lot to do but the basic assumptions proved correct and > >> > significant benefits can already be demonstrated. > >> > > >> > * about 20% of the coding is done so far > >> > * some core features implemented: > >> > logical addressing, reorg upon pack, block read > >> > >> What do you mean by pack here? Is it svnadmin pack? > > > > > > svnadmin pack > > > >> > >> Is that in any way an essential part of the performance boost? > > > > > > Yes. It will places items (noderevs, representations, change lists) > > next to each other when they will likely be requested shortly > > after one another. For instance, try to concatenate all elements > > of a deltification chain. > > > >> > >> Or are your format7 repositories always packed? > > > > > > They are not. Unpacked revisions will see a performance hit from > > reading the two extra index files per revision and a boost from > > block-read which will often fetch the whole revision with a single > > I/O operation. > > So is the main difference between format 6 and 7 how the data is > organized when they are packed? > Currently, yes. Plus the ability to read data from an arbitrary data block: for every position within a rev / pack file, we now know what data that is an can read it directly without DAG traversal etc. Thus, we now try to hit any block in a RAID system only once. However, there are limitations to our caching heuristics that will make this hard to achieve in some scenarios. Further work will address this in two ways: improve short-term caching hit rates to quasi 100% and reduce the number of items to cache. The latter requires further changes to the on-disk representation of data: We need to bundle them into larger blocks ("containers"). As a nice side-effect, we will safe another 30 .. 50% of disk space. > > Quite a number of reasons: > > > > * easy setup > > * minimal overhead (I want to get as close to measuring pure > > FS layer performance as possible) > > * easy to debug and profile > > I get that for development purposes, but I would personally like to > see that the caching etc. is yielding benefits when HTTP is used. > Apache should only add constant overhead, i.e. the absolute savings should be roughly the same. Once the cache-server branch is finished, the difference in cache efficiency & effect between svnserve and Apache should be gone. > > > '--enable-optimize' is new in 1.8. It should probably be documented > > somewhere but I'm not sure how safe it is to *recommend* it to > > packagers. The optimizations are quite aggressive and might break > > unclean code. > > > > I used it in conjunction with '-march=native' to minimize CPU time > > vs. I/O time. It saved 3 .. 5% of CPU cycles in my tests. > > OK. > > BTW, how are you managing your branch? I tried merging it back to > trunk to get an idea on the diff and there were a lot of text and tree > conflicts. I thought I had seen you doing synch merges from trunk in > the past so I assumed it would merge back. > Hm. I split fsfs.c and .h into multiple files on the branch and the first merge from /trunk required significant manual intervention. Since that, merges have been clean because fsfs.* wasn't touched on /trunk. If I understand Julian's merge changes in 1.8, reintegrating should work because there has been no cherry picking etc. -- Stefan^2 -- Certified & Supported Apache Subversion Downloads: * http://www.wandisco.com/subversion/download *