On 17.02.2016 15:33, Ivan Zhakov wrote:
On 16 February 2016 at 00:47, <stef...@apache.org> wrote:
Author: stefan2
Date: Mon Feb 15 21:47:00 2016
New Revision: 1730617
URL: http://svn.apache.org/viewvc?rev=1730617&view=rev
Log:
Continue work on the svn_repos_get_logs4 to svn_repos_get_logs5 migration:
Switch the last svn_fs_paths_changed2 call to svn_fs_paths_changed3.
* subversion/libsvn_repos/log.c
(fs_mergeinfo_changed): No longer fetch the whole changes list. However,
we need to iterate twice for best total performance
and we need to minimize FS iterator lifetimes.
It seems that I would be -1 against this particular change. In the
current implementation the svn_fs_paths_changed3() is called twice
that in the worst case will lead to *double read from disk*.
Pick your worst-case behaviour:
(1) Crash the server with OOM.
(2) For each change in a revision, perform a 1-step history lookup
(random I/O, about 2x as much data to read as with (3)).
(3) Read a linear file section twice.
I went with option (3). What is your preference?
As far as I understand you're relying to the fact that the second call
will hit the FSFS/FSX cache. But there will be a significant
performance degradation comparing to the 1.9 implementation in the
case of cache miss.
On a system without OS-side file cache, 'log -g'
on the repository root would take at most twice
as long as in 1.9. Other operations are not
affected.
Running it on any other directory will actually
be faster in many cases with 1.10 because the new
history traversal code no longer reconstructs all
directories up the tree for each revision.
So, hardly anything people will complain about.
As we are adding more and more of such code, more and more users
become faced with the significant performance regression (see [1] and
other cases).
If you consider [1] a significant performance regression,
please follow up on it and review the two relevant
backports for 1.9.
Do you intend to resolve this problem in the future commits? I have
some obvious solutions in mind, but maybe you also know something
about this.
The only reasonable alternative is to pick option
(2) and hope that the performance regression in
real-life is acceptable.
[1] http://svn.haxx.se/users/archive-2015-12/0060.shtml
-- Stefan^2.