Can you please improve your log message to tell what you actually did here.
You improved the code flow within some private parts of fsfs to avoid creating
errors.
What your log message tells to somebody that is reading the log message for
review:
I read the message as
* “HUGE PERFORMANCE IMPROVEMENT FOR MULTITHREADING !!!”
* “Pool creation in error production is expensive and should be avoided”
So you’re telling that you make things less expensive, as the pool creation is
so expensive…
But in fact you are not changing any of this, but are avoiding gettext, which
doesn’t use apr pools at all. And the pools aren’t in the performance trace.
In the past we tried to use error code only errors for these *expensive*
errors. These errors would then only get a (translated) message when their
error message was used. That is why all our error codes have default messages.
But why do we use gettext on server processes side in the first place?
We have no way to transfer the locale to the server, so we use the “C” locale
anyway. Using gettext in the client makes sense, but on the server not so much.
Bert
From: Stefan Fuhrmann [mailto:[email protected]]
Sent: maandag 14 april 2014 14:01
To: Bert Huijben
Cc: Subversion Development; [email protected]
Subject: Re: svn commit: r1586947 - in
/subversion/trunk/subversion/libsvn_fs_fs: dag.c dag.h tree.c
On Sun, Apr 13, 2014 at 6:18 PM, Bert Huijben <[email protected]
<mailto:[email protected]> > wrote:
> -----Original Message-----
> From: [email protected] <mailto:[email protected]>
> [mailto:[email protected] <mailto:[email protected]> ]
> Sent: zondag 13 april 2014 11:45
> To: [email protected] <mailto:[email protected]>
> Subject: svn commit: r1586947 - in
> /subversion/trunk/subversion/libsvn_fs_fs: dag.c dag.h tree.c
>
> Author: stefan2
> Date: Sun Apr 13 09:45:03 2014
> New Revision: 1586947
>
> URL: http://svn.apache.org/r1586947
> Log:
> Improve MT scalability of the FSFS DAG traversal code.
>
> Error objects are a very expensive way to control the control flow
> as they carry their own pools, created from a thread-safe root pool.
>
> dag_open should not return an error to open_path if the dirent
> cannot be found, pass a NULL node back instead. This eliminates
> about 50% of all transitional error objects during log-y operations.
> As there are only 2 callers of dag_open, they are easy to adapt.
Unless there are a lot of string creations to create a vey specific error
message, I've never seen a construction of svn_error_t * in any performance
traces...
As the log message said: This code was abusing
error messages for non-erroneous situations (point 1:
really bad design) with 10s of thousands of errors
being created, consumed and re-created (point 2:
inefficient).
The third point is that the implicit dcigettext call as
well as the apr_pool_t creation and destruction for
those error objects require process-wide synchronization.
That limits scalability. I'm currently looking into various
scenarios where a few concurrent requests are already
enough to hog the server or even make it go OOM.
Unless you have some numbers to prove that this helps on more setups than
yours, I don't see this as a *single reason* to change current behavior.
If you insist:
Server-side of 'svn log -v -g $tsvn_repo/trunk' (25k revs total in repo)
%Ir #called function
3.72 14062 svn_error_createf.constprop.259
2.72 9370 dcgettext
0.45 14064 svn_error_clear
(plus various small suff)
So, roughly 7% of the instructions are being wasted on
bad interface design. Now imagine what happens to repos
with somewhat more complicated merge histories.
And these numbers do *not* cover the OS sync. overhead.
There could be valid other reasons of course.
Such as the ones mentioned in the log message.
This looks like another case of micro-optimizing certain code which performance
critical... I think there is more to gain with other kinds of optimizations,
like changing total algorithms to scale better.
Last time I used a different hashing *algorithm*, you still
called it a micro-optimization hoping that a compiler could
do the job that I did in code ...
It wouldn't be the first case where you optimized code that shouldn't have been
executed in the first place.
So, did you review that code in the first place?
Note that there is still an open thread on [email protected] <mailto:[email protected]> about
reverting a lot of changes like this in fsfs for guaranteed stability. I would
say this falls in this same discussion if we decide to revert those.
There has been a thread about 1.9.0alpha1 (multiple threads,
actually). I will start one about FSFS format7 by the end of
this week.
Looking further in your patch makes it appear that this is just a minor
refactoring...
Hm. The obvious strategy here would be "read first, post later",
don't you agree?
I know that you work really hard and I fully appreciate
your feedback (and all your code contributions). Just
don't let the stress get the better of you ... ;)
Not sure if the summary really makes sense for that as I don't think it will
provide a measurable performance improvement, while the summary makes it appear
that it resolves major multithreading problems.
As Julian already pointed out, the initial sentence ("lemma"
in Wiki-speak) may be somewhat of an overstatement. But
I'm in the process of figuring out why some operations e.g.
scale better in Apache than svnserve. This is one of the
things that came up. Now, both are faster with svnserve
seeing a slightly larger improvement.
-- Stefan^2.