Am 11/09/13 22:21, schrieb Philip Guenther: > > Sorry, but I don't really find your tests convincing. > > * Only test the worst case of a matching buffer. > * Unreasonably large example used (are there *any* 256MB memcmp or > bcmp in the kernel?) > * Use of fprintf in the inner loop adds large fixed costs including > syscalls to what should be a microbenchmark. > * Measurements aren't of just the inner loop. > * No test showing the suggested compiler options actually have the > suggested effect. > > If you want to show that changing A will have a positive effect, you > need to have your test be as close a simulation of A as you can. This > doesn't seem to be that. >
All valid points. I compared things using 'objdump -d' already. As soon as GCC is told to optimize, it will inline bcmp/memcmp using 'repz cmpsb' (see '/usr/src/gnu/gcc/gcc/config/i386/i386.md' lines 18622ff and function 'expand_builtin_memcmp' in file '/usr/src/gnu/gcc/gcc/builtins.c'). This is slower even when comparing just a few bytes. The larger the number of bytes to compare gets, the more significant the difference becomes. See last result for just 128 bytes (0m29.58s vs. 0m5.32s). $ cc -DBSIZ=4 -DITERATIONS=1000000000 -O2 bcmp.c 0m23.54s real 0m23.24s user 0m0.00s system $ cc -DBSIZ=4 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp -fno-builtin-memcmp bcmp.c 0m18.79s real 0m18.76s user 0m0.00s system $ cc -DBSIZ=8 -DITERATIONS=1000000000 -O2 bcmp.c 0m32.46s real 0m32.15s user 0m0.00s system $ cc -DBSIZ=8 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp -fno-builtin-memcmp bcmp.c 0m20.03s real 0m20.00s user 0m0.00s system $ cc -DBSIZ=16 -DITERATIONS=1000000000 -O2 bcmp.c 0m49.81s real 0m49.78s user 0m0.00s system $ cc -DBSIZ=16 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp -fno-builtin-memcmp bcmp.c 0m22.62s real 0m22.57s user 0m0.00s system $ cc -DBSIZ=128 -DITERATIONS=100000000 -O2 bcmp.c 0m29.66s real 0m29.58s user 0m0.00s system $ cc -DBSIZ=128 -DITERATIONS=100000000 -O2 -fno-builtin-bcmp -fno-builtin-memcmp bcmp.c 0m5.33s real 0m5.32s user 0m0.00s system $ cat bcmp.c #include <string.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <sys/mman.h> #define VALUE (0xff) int main(int argc, char *argv[]) { void *b1, *b2; int i; b1 = malloc(BSIZ); if (b1 == NULL) { fprintf(stderr, "unable to allocate memory: %s\n", strerror(errno)); return errno; } if (mlock(b1, BSIZ)) { fprintf(stderr, "unable to lock memory: %s\n", strerror(errno)); return errno; } memset(b1, VALUE, BSIZ); b2 = malloc(BSIZ); if (b2 == NULL) { fprintf(stderr, "unable to allocate memory: %s\n", strerror(errno)); return errno; } if (mlock(b2, BSIZ)) { fprintf(stderr, "unable to lock memory: %s\n", strerror(errno)); return errno; } memset(b2, VALUE, BSIZ); for (i = 0; i < ITERATIONS; i++) { if (bcmp(b1, b2, BSIZ)) { fprintf(stderr, "buffers do not match\n"); } } if (munlock(b1, BSIZ)) { fprintf(stderr, "unable to unlock memory: %s\n", strerror(errno)); } if (munlock(b2, BSIZ)) { fprintf(stderr, "unable to unlock memory: %s\n", strerror(errno)); } free(b1); free(b2); return 0; }