On Fri, Nov 8, 2013 at 9:51 PM, Christian Schulte <c...@schulte.it> wrote: > Ok. Reason I am asking is this:
...something that has nothing to do with COPTS or make release, the ostensible thread subject... > $ cc bcmp.c > $ time ./a.out > 0m17.83s real 0m16.92s user 0m0.87s system > $ cc -O2 bcmp.c > $ time ./a.out > 1m0.98s real 1m0.17s user 0m0.87s system > $ cc memcmp.c > $ time ./a.out > 0m17.41s real 0m16.56s user 0m0.87s system > $ cc -O2 memcmp.c > $ time ./a.out > 1m1.03s real 1m0.18s user 0m0.88s system > > The difference comes from GCC optimizing away calls to the libc assembly > versions of bcmp/memcmp. For 'len' values greater than or equal to 8 > (change BSIZ below), those assembly versions perform way better than > 'cmpsb' inlined by GCC so that the following may be useful as most of > the bcmp/memcmp calls are using 'len' values greater than 8 and the > added function call overhead due to GCC no longer inlining these > functions should not be an issue. Sorry, but I don't really find your tests convincing. * Only test the worst case of a matching buffer. * Unreasonably large example used (are there *any* 256MB memcmp or bcmp in the kernel?) * Use of fprintf in the inner loop adds large fixed costs including syscalls to what should be a microbenchmark. * Measurements aren't of just the inner loop. * No test showing the suggested compiler options actually have the suggested effect. If you want to show that changing A will have a positive effect, you need to have your test be as close a simulation of A as you can. This doesn't seem to be that. Philip Guenther