Re: [PATCH] Saving a few cycles, part 2/2

Stefan Fuhrmann Mon, 26 Apr 2010 17:11:53 -0700

Julian Foad wrote:

On Sun, 2010-04-25, Stefan Fuhrmann wrote:

here comes the second set of local optimizations.
It contains a number of simple improvements over
the original code.


Hi Stefan.  Thanks for working on optimization.  Saving a few cycles can
be useful if it's in frequently-executed code so that the overall gain
is significant.  At the same time, we are trying to keep a good balance
against code simplicity.  Much of our code is more complex and longer
than it should be already, so I want to be careful about any change that
makes it even longer.

Have you got (or please could you get) some measurements for these
optimisations that you could show us?  From just looking at the patch, I
find it hard to believe that some of the changes make a consistent
difference, at least from the perspective of thinking about different
compilers.  Specific comments and questions are below.

All this code is over 3 weeks old now. I simply sent
the Easter weekend optimizing SVN, APR and zlib.
The total performance gains are tremendous.

Now, I'm in the tedious process of slicing the changes
into smaller, easily digestible chunks. Once I got them
all posted, I plan to write an extensive analysis of the
current SVN performance including several measurements
for different OS, FS, patch levels, repository formats etc.

Bottom line: Subversion-sized projects could technically
be checked out in less than 1 second (wire-speed permitting).

Also, in some cases you need to include a comment in the code saying
"This is done this way because it is faster than doing it the simple
way," otherwise the next person to edit the code will "improve" it back
to the way it was before.

Good point. Will do.

[[[
Numerous local optimizations.


When you write these log messages, please could you write a few words
about each change to explain why it's an improvement.  They are all
obvious to you, of course, but when the rest of us look at the patch,
some of the changes are not obvious at first.  It would be good if each
us us could read through your patch and think to ourselves, "Yes, I
agree with Stefan's reasoning for that change."

* subversion/include/svn_delta.h
  (enum svn_delta_action): ensure that these definitions
  will always have these values


I saw in a later part of the patch that this is so you can make the
function 'decode_instruction()' rely on those values.  As you know,
adding the explicit "=0", "=1", "=2" on the enum definition does not
make any difference to the compiler, it just provides a hint to the
human developers that these values might be significant.

We need a stronger hint than this.  Please add code comments in both the
enum definition and the function 'decode_instruction()', saying "The
enum values must match the instruction encoding values."

Oh! I wasn't aware of that - being a C++ guy.
But as long as comments can "fix" that, so be it ... ;)

* subversion/libsvn_delta/compose_delta.c
  (search_offset_index): use size_t to index memory


It looks like that is just for consistency with the other arguments.  Or
does it play a part in speed optimization as well?

Basically, it's a consistency improvement. However,
on LP64 and LLP64 systems, it saves a conversion
from int -> ptrdiff_t.

  (copy_source_ops): dito; optimize a common case


This is optimizing use of the C 'modulo' operator ('A % B') by
hand-coding B=1 and B=2 to use "& (B-1)" instead.  Are you sure your C
compiler doesn't already do this?  My intuition expects the compiler to
do it.

How would the compiler know that ptn_length is often
1 or 2? In the general case, such conditional code would
be a gross pessimization.

* subversion/libsvn_delta/svndiff.c
  (decode_file_offset, decode_size): use slightly more
  efficient formulations


In these two functions you are expanding two lines of expressions into
many simpler lines, and introducing a temporary variable to eliminate
multiple loads and stores.

I am suspicious of this change.  It is the compiler's job to transform
an expression into a longer stream of simple statements, and often the
compiler will do a better job than a human.  (Yes I know hand-coded
optimisations can sometimes be useful.)

Again, the compiler can't do much here: unless it summons
the daemons of global and feedback optimization (or the
user tells him not to care), there is no way to be *sure* that
p and val point to disjoint memory regions. Thus, it cannot
optimize the *val = f(*val) code.

My code saves 2 out of 3 memory accesses at the expense
of an additional write at the end.

Also, these changes (and the 'modulo' change above) look like the sort
of change that could generate faster code on one compiler but slower
code on another.

I'm very conscious about the impact on other compilers
and architectures. Therefore, I restrict my extensive arsenal
of "coding tricks" to what should be beneficial on most
CPUs and is hard for compilers to "screw up".

If you could show the numbers, that would help me/us to understand the
trade-off.

These changes were added in the later stages of my
optimization session. At that point, the major parts had
already gotten streamlined leaving other sections where
runtime seeped through cracks all over the place.

Each of the changes "putties" one of the cracks I came
across. So, individual savings are in the .5 .. 2% range
but the total effect is somewhere between 3 and 10%
(depending on platform, compiler, etc.).

  (decode_instruction): directly map action codes -
  made possible by defining the enum values

* subversion/libsvn_subr/checksum.c

  (svn_checksum_parse_hex): eliminate CTR function calls


Did you mean 'CRT' (C Run-Time library)?  If so, would
svn_ctype_isxdigit() be useful, or does calling that have the same
overhead as calling the CRT?  It seems wrong to write out ASCII-hex
conversion functions in long-hand code, and is the sort of code I would
normally replace by a function call whenever I see it.  At least make it
a separate function or macro in the source file, with a comment
explaining why the system 'isxdigit' or 'svn_ctype_isxdigit' is not
being used.

My problem was that I couldn't see what CRT function
took *that* long. Parsing a checksum should be so rare
that it wouldn't even show up in the profiler. But it did.

Elsethread, isalpha() got blamed for it. I will change my
code using the feedback I got.

-- Stefan^2.

Re: [PATCH] Saving a few cycles, part 2/2

Reply via email to