On Fri, Nov 8, 2013 at 9:51 PM, Christian Schulte <c...@schulte.it> wrote:
> Ok. Reason I am asking is this:

...something that has nothing to do with COPTS or make release, the
ostensible thread subject...


> $ cc bcmp.c
> $ time ./a.out
>     0m17.83s real     0m16.92s user     0m0.87s system
> $ cc -O2 bcmp.c
> $ time ./a.out
>     1m0.98s real     1m0.17s user     0m0.87s system
> $ cc memcmp.c
> $ time ./a.out
>     0m17.41s real     0m16.56s user     0m0.87s system
> $ cc -O2 memcmp.c
> $ time ./a.out
>     1m1.03s real     1m0.18s user     0m0.88s system
>
> The difference comes from GCC optimizing away calls to the libc assembly
> versions of bcmp/memcmp. For 'len' values greater than or equal to 8
> (change BSIZ below), those assembly versions perform way better than
> 'cmpsb' inlined by GCC so that the following may be useful as most of
> the bcmp/memcmp calls are using 'len' values greater than 8 and the
> added function call overhead due to GCC no longer inlining these
> functions should not be an issue.

Sorry, but I don't really find your tests convincing.

* Only test the worst case of a matching buffer.
* Unreasonably large example used (are there *any* 256MB memcmp or
bcmp in the kernel?)
* Use of fprintf in the inner loop adds large fixed costs including
syscalls to what should be a microbenchmark.
* Measurements aren't of just the inner loop.
* No test showing the suggested compiler options actually have the
suggested effect.

If you want to show that changing A will have a positive effect, you
need to have your test be as close a simulation of A as you can.  This
doesn't seem to be that.


Philip Guenther

Reply via email to