On Tue, Nov 11, 2014 at 8:40 PM, Evgeny Kotkov <evgeny.kot...@visualsvn.com> wrote:
> Branko Čibej <br...@wandisco.com> writes: > > >> From the performance point of view there will be no big benefits to > enable > >> log addressing for an existing repository, because the existing old part > >> of the repository will remain to be addressed physically. > > > > I disagree with your assessment. Certainly, as long as there are "live" > > delta chains in the repository that reach all the way into > > physically-indexed content, there will less performance benefit from > > logical addressing than in a "pure" FSFSv7. But this state will not > > persist "forever", certainly not for actively changed content. > [working my way up through my TODO stack.] > I did a small attempt in measuring the performance benefits of the > mixed-mode > addressing. Please note that these results are only provided for the > Windows > platform and only cover basic operations over two protocols. My tests were > done under Windows 8.1 Professional (Apache HTTP Server 2.2.29, serf > 1.3.8), > a part of the batch file covering the 'file://' protocol is attached. > Thanks for running the tests, Evgeny. I accept r1637184 and only like to comment on a few details in your data - so we won't operate based on different assumptions. To manage user's expectations, I added a section to our release notes that explains when and how format 7 will be useful: http://subversion.apache.org/docs/release-notes/1.9#format7-comparison I used the http://tortoisesvn.googlecode.com/svn/ repository (25851 > revisions) > in my experiments. What I did was building 1.9.0 binaries from r1637183 > and > r1637184. Right after that I started examining, what would have happened > with > the performance in three different scenarios: > > - The whole repository (25851 revisions) received an upgrade to FSFS7, and > all > revisions are physically addressed, i.e. no mixed-mode addressing > happened. > This is the default upgrade behavior as of r1637184. > My mental model of "mixed addressing" is that that a certain x percentage of the request is being carried out on new format data and the remainder on the old part. That should result in a linear combination of "old" and "new" speed: mixed = x*new + (1-x)*old. With an addition run on a completely new format repository, we would be able to estimate the "x" portion of a request that benefits from the new format: x = (old - mixed) / (old - new). Because revision size and content changes over time as the project matures, this is only an estimate. - The repository received an upgrade to FSFS7 with mixed-mode addressing and > has been accumulating new logically addressed revisions for one year. A > corresponding revision span is the following: > > (r24752, 9/10/2013 → r25851, 9/10/2014) > The new addressing scheme will only be used from the next shard on, i.e. r25000 and it will speed things up only when being packed. Since there is no full shard, yet, we won't see an improvement. > - The repository received an upgrade to FSFS7 with mixed-mode addressing > and > has been accumulating new logically addressed revisions for three > years. A > corresponding revision span is the following: > > (r21959, 9/10/2011 → r25851, 9/10/2014) > This gives 3 reordered packed shards out of 25. We should expect 10..15% speedup in "svn log -v", reading all revs, and possibly more for export / checkout, concentrating on later revisions. > In one year (r24752, 9/10/2013 → r25851, 9/10/2014), the performance boost > from using the mixed addressing mode would be the following: > > (http://) > > svn-bench null-log unpacked 15.765 → 15.682 s (0.5 % gain) > svn-bench null-log packed 16.811 → 16.400 s (2.4 % gain) > svn-bench null-log -v unpacked 16.236 → 16.130 s (0.7 % gain) > svn-bench null-log -v packed 17.166 → 16.921 s (1.4 % gain) > I assume you ran all tests from hot OS caches. If that is the case, we won't see the differences in I/O. But at least your numbers show that there is no significant difference in CPU load. Apart from that, two effects are visible here. Authz implies '-v' on the request side and packed revprops are slower that non-packed ones. The latter has recently been fixed by faster parsers and smaller pack size defaults. > svn-bench null-export unpacked 43.808 → 43.644 s (0.4 % gain) > svn-bench null-export packed 43.010 → 43.039 s (0.1 % loss) > Despite running multiple requests in parallel, this is much slower than file:// access. The reason is that for every node, the mod_dav_svn access pattern requires a full DAG walk starting at some "random" revision. There is an experimental patch somewhere in backlog that effectively eliminates this overhead. I'll polish it and post it to the dev@ list - maybe it's something we want to fix in 1.9. (file://) > > svn-bench null-log unpacked 3.303 → 3.276 s (0.8 % gain) > svn-bench null-log packed 5.902 → 5.947 s (0.8 % gain) > The parser overhead is very visible here. > svn-bench null-log -v unpacked 12.530 → 12.688 s (1.3 % loss) > svn-bench null-log -v packed 13.514 → 13.545 s (0.2 % gain) > svn-bench null-export unpacked 12.362 → 12.434 s (0.6 % gain) > svn-bench null-export packed 12.316 → 12.170 s (1.2 % gain) > > In three years (r21959, 9/10/2011 → r25851, 9/10/2014), the performance > boost > from using the mixed addressing mode would be the following: > > (http://) > > svn-bench null-log unpacked 15.765 → 15.313 s (2.9 % gain) > svn-bench null-log packed 16.811 → 16.193 s (3.7 % gain) > svn-bench null-log -v unpacked 16.236 → 15.596 s (3.9 % gain) > svn-bench null-log -v packed 17.166 → 16.648 s (3.0 % gain) > svn-bench null-export unpacked 43.808 → 43.930 s (0.3 % loss) > svn-bench null-export packed 43.010 → 43.169 s (0.4 % loss) > > (file://) > > svn-bench null-log unpacked 3.303 → 3.413 s (3.3 % loss) > svn-bench null-log packed 5.902 → 5.942 s (0.7 % loss) > svn-bench null-log -v unpacked 12.530 → 12.458 s (0.6 % gain) > svn-bench null-log -v packed 13.514 → 13.164 s (2.6 % gain) > svn-bench null-export unpacked 12.362 → 12.945 s (4.7 % loss) > svn-bench null-export packed 12.316 → 12.537 s (1.8 % loss) > For unpacked data, we won't expect much of a difference. Running the tests from entirely cold OS caches, I get about 15% faster null-exports with an "x" of 45% (ra_local on SSD). Surprisingly, null-log -v is 25% faster with an "x" of 30%. The reason is that later revisions happen to be 3x a large as older ones, so reordering data saves much more I/O in later pack files that it does for old ones. Hence, we see twice the expected impact. > I do not want to make any conclusions on this topic. However, my results > do > not show any obvious advantage of having the mixed-mode addressing enabled > for > the sample (http://tortoisesvn.googlecode.com/svn/) repository. Even > after > *three* years of logically addressed revisions landing into the repository, > the performance gains still fluctuate around zero. > It is important to understand that there is no major structural difference between phys. and log. addressed repositories. Both have the same item granularity and do the same pointer chasing when reconstructing the contents. Differences are having a simple manifest in separate file vs. having a more complex index structure in the same file as the actual rev data. Overall, the same CPU load. What is expensive, however, is deflate and checksumming on the CPU side and turning the pointer chasing into a random access orgy on the I/O side. The first issue is addressed by the fulltext caches. Once in cache, no checksumming etc. is necessary anymore. This is why they are so much faster than reading data from OS caches. The random I/O is much harder to eliminate and log. addressing in FSFS gets it down by only 50% or so due to the rev shard granularity. FSX will hopefully and eventually do a much better job there. -- Stefan^2.