On Sun, 2010-04-25, Stefan Fuhrmann wrote: > here comes the second set of local optimizations. > It contains a number of simple improvements over > the original code.
Hi Stefan. Thanks for working on optimization. Saving a few cycles can be useful if it's in frequently-executed code so that the overall gain is significant. At the same time, we are trying to keep a good balance against code simplicity. Much of our code is more complex and longer than it should be already, so I want to be careful about any change that makes it even longer. Have you got (or please could you get) some measurements for these optimisations that you could show us? From just looking at the patch, I find it hard to believe that some of the changes make a consistent difference, at least from the perspective of thinking about different compilers. Specific comments and questions are below. Also, in some cases you need to include a comment in the code saying "This is done this way because it is faster than doing it the simple way," otherwise the next person to edit the code will "improve" it back to the way it was before. > [[[ > Numerous local optimizations. When you write these log messages, please could you write a few words about each change to explain why it's an improvement. They are all obvious to you, of course, but when the rest of us look at the patch, some of the changes are not obvious at first. It would be good if each us us could read through your patch and think to ourselves, "Yes, I agree with Stefan's reasoning for that change." > * subversion/include/svn_delta.h > (enum svn_delta_action): ensure that these definitions > will always have these values I saw in a later part of the patch that this is so you can make the function 'decode_instruction()' rely on those values. As you know, adding the explicit "=0", "=1", "=2" on the enum definition does not make any difference to the compiler, it just provides a hint to the human developers that these values might be significant. We need a stronger hint than this. Please add code comments in both the enum definition and the function 'decode_instruction()', saying "The enum values must match the instruction encoding values." > * subversion/libsvn_delta/compose_delta.c > (search_offset_index): use size_t to index memory It looks like that is just for consistency with the other arguments. Or does it play a part in speed optimization as well? > (copy_source_ops): dito; optimize a common case This is optimizing use of the C 'modulo' operator ('A % B') by hand-coding B=1 and B=2 to use "& (B-1)" instead. Are you sure your C compiler doesn't already do this? My intuition expects the compiler to do it. > * subversion/libsvn_delta/svndiff.c > (decode_file_offset, decode_size): use slightly more > efficient formulations In these two functions you are expanding two lines of expressions into many simpler lines, and introducing a temporary variable to eliminate multiple loads and stores. I am suspicious of this change. It is the compiler's job to transform an expression into a longer stream of simple statements, and often the compiler will do a better job than a human. (Yes I know hand-coded optimisations can sometimes be useful.) Also, these changes (and the 'modulo' change above) look like the sort of change that could generate faster code on one compiler but slower code on another. If you could show the numbers, that would help me/us to understand the trade-off. > (decode_instruction): directly map action codes - > made possible by defining the enum values > > * subversion/libsvn_subr/checksum.c > (svn_checksum_parse_hex): eliminate CTR function calls Did you mean 'CRT' (C Run-Time library)? If so, would svn_ctype_isxdigit() be useful, or does calling that have the same overhead as calling the CRT? It seems wrong to write out ASCII-hex conversion functions in long-hand code, and is the sort of code I would normally replace by a function call whenever I see it. At least make it a separate function or macro in the source file, with a comment explaining why the system 'isxdigit' or 'svn_ctype_isxdigit' is not being used. > * subversion/libsvn_subr/stream.c > (stream_readline): numbytes is invariant until EOF > ]]] > plain text document attachment (PerformanceNumerics.patch): > Index: subversion/include/svn_delta.h > =================================================================== > --- subversion/include/svn_delta.h (revision 937673) > +++ subversion/include/svn_delta.h (working copy) > @@ -100,7 +100,7 @@ > * It must be the case that 0 <= @a offset < @a offset + > * @a length <= size of source view. > */ > - svn_txdelta_source, > + svn_txdelta_source = 0, > > /** Append the @a length bytes at @a offset in the target view, to the > * target. > @@ -119,7 +119,7 @@ > * useful in encoding long runs of consecutive characters, long runs > * of CR/LF pairs, etc. > */ > - svn_txdelta_target, > + svn_txdelta_target = 1, > > /** Append the @a length bytes at @a offset in the window's @a new string > * to the target. > @@ -129,7 +129,7 @@ > * order with no overlap at the moment; svn_txdelta_to_svndiff() > * depends on this. > */ > - svn_txdelta_new > + svn_txdelta_new = 2 > }; > > /** A single text delta instruction. */ > Index: subversion/libsvn_delta/compose_delta.c > =================================================================== > --- subversion/libsvn_delta/compose_delta.c (revision 937673) > +++ subversion/libsvn_delta/compose_delta.c (working copy) > @@ -165,10 +165,10 @@ > as hint because most lookups come as a sequence of decreasing values > for OFFSET and they concentrate on the lower end of the array. */ > > -static int > -search_offset_index(const offset_index_t *ndx, apr_size_t offset, int hint) > +static apr_size_t > +search_offset_index(const offset_index_t *ndx, apr_size_t offset, apr_size_t > hint) > { > - int lo, hi, op; > + apr_size_t lo, hi, op; > > assert(offset < ndx->offs[ndx->length]); > > @@ -635,13 +635,13 @@ > static void > copy_source_ops(apr_size_t offset, apr_size_t limit, > apr_size_t target_offset, > - int hint, > + apr_size_t hint, > svn_txdelta__ops_baton_t *build_baton, > const svn_txdelta_window_t *window, > const offset_index_t *ndx, > apr_pool_t *pool) > { > - int op_ndx = search_offset_index(ndx, offset, hint); > + apr_size_t op_ndx = search_offset_index(ndx, offset, hint); > for (;; ++op_ndx) > { > const svn_txdelta_op_t *const op = &window->ops[op_ndx]; > @@ -694,7 +694,10 @@ > The idea here is to transpose the pattern, then generate > another overlapping copy. */ > const apr_size_t ptn_length = off[0] - op->offset; > - const apr_size_t ptn_overlap = fix_offset % ptn_length; > + const apr_size_t ptn_overlap = ptn_length > 2 > + ? fix_offset % ptn_length > + : fix_offset & (ptn_length-1); > + > apr_size_t fix_off = fix_offset; > apr_size_t tgt_off = target_offset; > assert(ptn_length > ptn_overlap); > Index: subversion/libsvn_delta/svndiff.c > =================================================================== > --- subversion/libsvn_delta/svndiff.c (revision 937673) > +++ subversion/libsvn_delta/svndiff.c (working copy) > @@ -361,16 +361,24 @@ > const unsigned char *p, > const unsigned char *end) > { > + svn_filesize_t temp = 0; > if (p + MAX_ENCODED_INT_LEN < end) > end = p + MAX_ENCODED_INT_LEN; > /* Decode bytes until we're done. */ > - *val = 0; > while (p < end) > { > - *val = (*val << 7) | (*p & 0x7f); > - if (((*p++ >> 7) & 0x1) == 0) > + unsigned c = *p; Here 'c' is 'unsigned' meaning 'unsigned int'. (Please write 'unsigned int' in full if that's intended.) Shouldn't it be 'unsigned char', or do you think using an int is faster? Or, as in the next change below, did you want it to match the type of 'temp' - in this case svn_filesize_t? > + temp = (temp << 7) | (c & 0x7f); > + ++p; > + > + if (c < 0x80) > + { > + *val = temp; > return p; > + } > } > + > + *val = temp; > return NULL; > } > > @@ -382,16 +390,24 @@ > const unsigned char *p, > const unsigned char *end) > { > + apr_size_t temp = 0; Style nit: please put a blank line after the declarations block, here and after declaring 'c' below, and also in the function above. > if (p + MAX_ENCODED_INT_LEN < end) > end = p + MAX_ENCODED_INT_LEN; > /* Decode bytes until we're done. */ > - *val = 0; > while (p < end) > { > - *val = (*val << 7) | (*p & 0x7f); > - if (((*p++ >> 7) & 0x1) == 0) > + apr_size_t c = *p; (By contrast, here 'c' is 'apr_size_t'.) > + temp = (temp << 7) | (c & 0x7f); > + ++p; > + > + if (c < 0x80) > + { > + *val = temp; > return p; > + } > } > + > + *val = temp; > return NULL; > } > > @@ -458,27 +474,30 @@ > const unsigned char *p, > const unsigned char *end) > { > + apr_size_t c; > + apr_size_t action; > + > if (p == end) > return NULL; > > /* Decode the instruction selector. */ > - switch ((*p >> 6) & 0x3) > - { > - case 0x0: op->action_code = svn_txdelta_source; break; > - case 0x1: op->action_code = svn_txdelta_target; break; > - case 0x2: op->action_code = svn_txdelta_new; break; > - case 0x3: return NULL; > - } > + c = *p; > + p++; Write "c = *p++". (Try not to make the code more long-winded than necessary.) > + action = (c >> 6) & 0x3; > + if (action >= 0x3) > + return NULL; > > + op->action_code = (enum svn_delta_action)(action); > + > /* Decode the length and offset. */ > - op->length = *p++ & 0x3f; > + op->length = c & 0x3f; > if (op->length == 0) > { > p = decode_size(&op->length, p, end); > if (p == NULL) > return NULL; > } > - if (op->action_code != svn_txdelta_new) > + if (action != svn_txdelta_new) > { > p = decode_size(&op->offset, p, end); > if (p == NULL) > Index: subversion/libsvn_subr/checksum.c > =================================================================== > --- subversion/libsvn_subr/checksum.c (revision 937673) > +++ subversion/libsvn_subr/checksum.c (working copy) > @@ -206,12 +206,39 @@ > > for (i = 0; i < len; i++) > { > - if ((! isxdigit(hex[i * 2])) || (! isxdigit(hex[i * 2 + 1]))) > + char c1 = hex[i * 2]; > + char c2 = hex[i * 2 + 1]; > + > + svn_boolean_t bad = FALSE; > + unsigned char upper; > + unsigned char lower; > + > + if (c1 > '9') > + { > + bad = (c1 > 'f') || (c1 < 'a'); > + upper = c1 - 'a' + 10; > + } > + else > + { > + bad = c1 < '0'; > + upper = c1 - '0'; > + } > + > + if (c2 > '9') > + { > + bad |= (c2 > 'f') || (c2 < 'a'); > + lower = c2 - 'a' + 10; > + } > + else > + { > + bad |= c2 < '0'; > + lower = c2 - '0'; > + } > + > + if (bad) > return svn_error_create(SVN_ERR_BAD_CHECKSUM_PARSE, NULL, NULL); > > - ((unsigned char *)(*checksum)->digest)[i] = > - (( isalpha(hex[i*2]) ? hex[i*2] - 'a' + 10 : hex[i*2] - '0') << 4) | > - ( isalpha(hex[i*2+1]) ? hex[i*2+1] - 'a' + 10 : hex[i*2+1] - '0'); > + ((unsigned char *)(*checksum)->digest)[i] = (upper << 4) | lower; > is_zeros |= (*checksum)->digest[i]; > } > > Index: subversion/libsvn_subr/stream.c > =================================================================== > --- subversion/libsvn_subr/stream.c (revision 937673) > +++ subversion/libsvn_subr/stream.c (working copy) > @@ -355,9 +355,9 @@ > > /* Read into STR up to and including the next EOL sequence. */ > match = eol_str; > + numbytes = 1; > while (*match) > { > - numbytes = 1; 'numbytes' is indeed invariant until the end of the loop, but this change is surely in the category of micro-optimisation. Unless you show me a significant measured gain, we should instead move the declaration of 'numbytes' *into* the scope of this block, for code readability. Then we should re-design the whole "read line" approach to avoid the need for one-character-at-a-time calls to the stream's "read" function. (Such a re-design would require API changes.) > SVN_ERR(svn_stream_read(stream, &c, &numbytes)); > if (numbytes != 1) > { > - Julian