Am 11/09/13 22:21, schrieb Philip Guenther:
> 
> Sorry, but I don't really find your tests convincing.
> 
> * Only test the worst case of a matching buffer.
> * Unreasonably large example used (are there *any* 256MB memcmp or
> bcmp in the kernel?)
> * Use of fprintf in the inner loop adds large fixed costs including
> syscalls to what should be a microbenchmark.
> * Measurements aren't of just the inner loop.
> * No test showing the suggested compiler options actually have the
> suggested effect.
> 
> If you want to show that changing A will have a positive effect, you
> need to have your test be as close a simulation of A as you can.  This
> doesn't seem to be that.
> 

All valid points. I compared things using 'objdump -d' already. As soon
as GCC is told to optimize, it will inline bcmp/memcmp using 'repz
cmpsb' (see '/usr/src/gnu/gcc/gcc/config/i386/i386.md' lines 18622ff and
function 'expand_builtin_memcmp' in file
'/usr/src/gnu/gcc/gcc/builtins.c'). This is slower even when comparing
just a few bytes. The larger the number of bytes to compare gets, the
more significant the difference becomes. See last result for just 128
bytes (0m29.58s vs. 0m5.32s).

$ cc -DBSIZ=4 -DITERATIONS=1000000000 -O2 bcmp.c

    0m23.54s real     0m23.24s user     0m0.00s system

$ cc -DBSIZ=4 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c

    0m18.79s real     0m18.76s user     0m0.00s system

$ cc -DBSIZ=8 -DITERATIONS=1000000000 -O2 bcmp.c

    0m32.46s real     0m32.15s user     0m0.00s system

$ cc -DBSIZ=8 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c

    0m20.03s real     0m20.00s user     0m0.00s system

$ cc -DBSIZ=16 -DITERATIONS=1000000000 -O2 bcmp.c

    0m49.81s real     0m49.78s user     0m0.00s system

$ cc -DBSIZ=16 -DITERATIONS=1000000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c

    0m22.62s real     0m22.57s user     0m0.00s system

$ cc -DBSIZ=128 -DITERATIONS=100000000 -O2 bcmp.c

    0m29.66s real     0m29.58s user     0m0.00s system

$ cc -DBSIZ=128 -DITERATIONS=100000000 -O2 -fno-builtin-bcmp
-fno-builtin-memcmp bcmp.c

    0m5.33s real     0m5.32s user     0m0.00s system


$ cat bcmp.c
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>

#define VALUE (0xff)

int
main(int argc, char *argv[])
{
        void           *b1, *b2;
        int             i;

        b1 = malloc(BSIZ);
        if (b1 == NULL) {
                fprintf(stderr, "unable to allocate memory: %s\n",
                        strerror(errno));
                return errno;
        }
        if (mlock(b1, BSIZ)) {
                fprintf(stderr, "unable to lock memory: %s\n",
                        strerror(errno));
                return errno;
        }
        memset(b1, VALUE, BSIZ);

        b2 = malloc(BSIZ);
        if (b2 == NULL) {
                fprintf(stderr, "unable to allocate memory: %s\n",
                        strerror(errno));
                return errno;
        }
        if (mlock(b2, BSIZ)) {
                fprintf(stderr, "unable to lock memory: %s\n",
                        strerror(errno));
                return errno;
        }
        memset(b2, VALUE, BSIZ);

        for (i = 0; i < ITERATIONS; i++) {
                if (bcmp(b1, b2, BSIZ)) {
                        fprintf(stderr, "buffers do not match\n");
                }
        }

        if (munlock(b1, BSIZ)) {
                fprintf(stderr, "unable to unlock memory: %s\n",
                        strerror(errno));
        }
        if (munlock(b2, BSIZ)) {
                fprintf(stderr, "unable to unlock memory: %s\n",
                        strerror(errno));
        }
        free(b1);
        free(b2);

        return 0;
}

Reply via email to