[issue13134] speed up finding of one-character strings

Marc-Andre Lemburg Sun, 09 Oct 2011 04:23:13 -0700

Marc-Andre Lemburg <m...@egenix.com> added the comment:

[Posted the reply to the right ticket; see issue13136 for the original
 post to the wrong ticket]


Antoine Pitrou wrote:
> 
> Antoine Pitrou <pit...@free.fr> added the comment:
> 
>> Before going further with this, I'd suggest you have a look at your
>> compiler settings.
> 
> They are set by the configure script:
> 
> gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall
> -Wstrict-prototypes    -I. -I./Include    -DPy_BUILD_CORE -o
> Objects/unicodeobject.o Objects/unicodeobject.c

Which gcc version are you using ?
Is it possible that you have -fno-builtin enabled ?

>> Such optimizations are normally performed by the
>> compiler and don't need to be implemented in C, making maintenance
>> harder.
> 
> The fact that the glibc includes such optimization (in much more
> sophisticated form) suggests to me that many compilers don't perform
> these optimizations automically.

When using gcc, the glibc functions are usually not used at all,
since gcc comes with a (rather large) set of builtins which are
inlined directly, if you have optimizations enabled and inlining
is found to be more efficient than calling the glibc function:

http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

glibc includes the optimized versions since it has to implement
C library (obviously) and for cases where inlining does not
happen.

>> I tested using memchr() when writing those "naive" loops.
> 
> memchr() is mentioned in another issue, #13134.
> 
>> memchr()
>> is inlined by the compiler just like the direct loop
> 
> I don't think so. If you look at the glibc's memchr() implementation,
> it's a sophisticated routine, not a trivial loop. Perhaps you're
> thinking about memcpy().

See http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and the
assembler output. If it's not inlined, then something must be
preventing this and it would be good to find out why.

>> and the generated
>> code for the direct version is often easier to optimize for the compiler
>> than the memchr() one, since it receives more knowledge about the used
>> data types.
> 
> ?? Data types are fixed in the memchr() definition, there's no knowledge
> to be gained by inlining.

There is: the compiler will have alignement information available and
can also benefit from using registers instead of the stack, knowledge
about processor cache lines, etc. Such information is lost when calling
a function. The function call itself will also create some overhead.

BTW: You should not only test the optimization with long strings, but also
with short ones (e.g. 2-15 chars) - which is a much more common case
in practice.

----------
nosy: +lemburg

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13134>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13134] speed up finding of one-character strings

Reply via email to