Marc-Andre Lemburg <m...@egenix.com> added the comment: [Posted the reply to the right ticket; see issue13136 for the original post to the wrong ticket]
Antoine Pitrou wrote: > > Antoine Pitrou <pit...@free.fr> added the comment: > >> Before going further with this, I'd suggest you have a look at your >> compiler settings. > > They are set by the configure script: > > gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o > Objects/unicodeobject.o Objects/unicodeobject.c Which gcc version are you using ? Is it possible that you have -fno-builtin enabled ? >> Such optimizations are normally performed by the >> compiler and don't need to be implemented in C, making maintenance >> harder. > > The fact that the glibc includes such optimization (in much more > sophisticated form) suggests to me that many compilers don't perform > these optimizations automically. When using gcc, the glibc functions are usually not used at all, since gcc comes with a (rather large) set of builtins which are inlined directly, if you have optimizations enabled and inlining is found to be more efficient than calling the glibc function: http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html glibc includes the optimized versions since it has to implement C library (obviously) and for cases where inlining does not happen. >> I tested using memchr() when writing those "naive" loops. > > memchr() is mentioned in another issue, #13134. > >> memchr() >> is inlined by the compiler just like the direct loop > > I don't think so. If you look at the glibc's memchr() implementation, > it's a sophisticated routine, not a trivial loop. Perhaps you're > thinking about memcpy(). See http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and the assembler output. If it's not inlined, then something must be preventing this and it would be good to find out why. >> and the generated >> code for the direct version is often easier to optimize for the compiler >> than the memchr() one, since it receives more knowledge about the used >> data types. > > ?? Data types are fixed in the memchr() definition, there's no knowledge > to be gained by inlining. There is: the compiler will have alignement information available and can also benefit from using registers instead of the stack, knowledge about processor cache lines, etc. Such information is lost when calling a function. The function call itself will also create some overhead. BTW: You should not only test the optimization with long strings, but also with short ones (e.g. 2-15 chars) - which is a much more common case in practice. ---------- nosy: +lemburg _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13134> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com