https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199587
--- Comment #1 from Eitan Adler <ead...@freebsd.org> --- (adding to the bug) >From bde: This is basically confusing the compiler to produce not so good code in a different way. Your implementation is a bit cleaner since it doesn't arrange the source code in a way that it thinks will be good for the object code. This results in it being slower for old compilers, faster for some in-between compilers, and no different for new compilers. However, all the C versions are now faster than the asm versions on amd64 and i386 on 2 i7 CPUs. I added tests for the latter, and sprinkled some volatiles to stop the compiler optimizing away the whole loop for the asm (libc) versions. i386, 4790K @ 4.28GHz: gcc-3.3.3 -O (but no -march etc. complications): 10.0 Gcycles -- libc strncmp() (asm source, external linkage) 10.1 Gcycles -- libc strncmp() (copy of the C version) 11.3 Gcycles -- My Implementation gcc-3.3.3 -O2: 12.0 Gcycles -- libc strncmp() (asm source, external linkage) 9.4 Gcycles -- libc strncmp() (copy of the C version) 10.2 Gcycles -- My Implementation libc asm strncmp() really was made 20% slower by increasing the optimization level from -O to -O2, although strncmp() itself didn't change. This might be due to the loop being poorly aligned. Tuning with -march might be needed to avoid 20% differences, so the mere 10% differences in these tests might be noise. (I didn't bother giving many data data points, since nose from rerunning the tests is much smaller than 10-20% differences from tuning.) gcc-4.2.1 -O: 11.4 Gcycles -- libc strncmp() (asm source, external linkage) 13.1 Gcycles -- libc strncmp() (copy of the C version) 12.1 Gcycles -- My Implementation gcc-4.2.1 -O is much slower than gcc-3.3.3, but not so bad for your implementation. gcc-4.2.1 -O2: 10.1 Gcycles -- libc strncmp() (asm source, external linkage) 9.5 Gcycles -- libc strncmp() (copy of the C version) 9.3 Gcycles -- My Implementation gcc-4.2.1 is OK. amd64, Xeon 5650 @ 2.67GHz: clang -O: The calls to *strcmp() were almost all optimized away. I fixed this by replacing str1 in the call to str1 + v, where v is a volatile int with value 0. 13.8 Gcycles -- libc strncmp() (C source, external linkage) 13.8 Gcycles -- libc strncmp() (copy of the C version) 13.8 Gcycles -- My Implementation libc asm strncmp() is of interest here although it doesn't exist -- if it existed, then it would be more bogus that on i386, since amd64 doesn't run on the 1990 modem CPUs where the asm version was probably faster. The asm i386 version as tuned for original i386's and barely changed since then. Just as well, since it would be very messy with tuning for 10-20 generations of CPUs with several classes of CPU per generation. amd64 libc string functions used to be missing all silly optimizations like this, but now optimizes the almost-never-used function stpcpy(), and its asm versions of strcat() and strcmp() are probably mistakes too. i386, Xeon 5650 @ 2.67GHz: clang -O [-march=native makes no difference] 12.0 Gcycles -- libc strncmp() (asm source, external linkage) 15.1 Gcycles -- libc strncmp() (copy of the C version) 11.5 Gcycles -- My Implementation clang is even more confused by the copy of libc C strncmp() than gcc-4.2.1. Bruce -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"