I consider this a very good test.  As you can see from the date of my
last test, 1997/09/11, I think I may have had a dual Pentium Pro at that
point, and hardware has certainly changed since then.  I did try 128 at
that time and found it to be slower, but with newer hardware, it is very
possible it has improved.

I remember in writing that macro how surprised I was that there was any
improvements, but obviously there is a gain and the gain is getting
bigger.

I tested the following program:
                
        #include <string.h>
        #include "postgres.h"
        
        #undef  MEMSET_LOOP_LIMIT
        #define MEMSET_LOOP_LIMIT  1000000
        
        int
        main(int argc, char **argv)
        {
                int             len = atoi(argv[1]);
                char            buffer[len];
                long long       i;
        
                for (i = 0; i < 9900000; i++)
                        MemSet(buffer, 0, len);
                return 0;
        }

and, yes, -O2 is significant!  Looks like we use -O2 on all platforms
that use GCC so we should be OK there.

I tested with the following script:

        for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c";
        time tst1 $TIME; done

and got for MemSet:
        
        *64
        real    0m1.001s
        user    0m1.000s
        sys     0m0.003s
        *128
        real    0m1.578s
        user    0m1.567s
        sys     0m0.013s
        *256
        real    0m2.723s
        user    0m2.723s
        sys     0m0.003s
        *512
        real    0m5.044s
        user    0m5.029s
        sys     0m0.013s
        *1024
        real    0m9.621s
        user    0m9.621s
        sys     0m0.003s
        *2048
        real    0m18.821s
        user    0m18.811s
        sys     0m0.013s
        *4096
        real    0m37.266s
        user    0m37.266s
        sys     0m0.003s

and for memset():
        
        *64
        real    0m1.813s
        user    0m1.801s
        sys     0m0.014s
        *128
        real    0m2.489s
        user    0m2.499s
        sys     0m0.994s
        *256
        real    0m4.397s
        user    0m5.389s
        sys     0m0.005s
        *512
        real    0m5.186s
        user    0m6.170s
        sys     0m0.015s
        *1024
        real    0m6.676s
        user    0m6.676s
        sys     0m0.003s
        *2048
        real    0m9.766s
        user    0m9.776s
        sys     0m0.994s
        *4096
        real    0m15.970s
        user    0m15.954s
        sys     0m0.003s

so for BSD/OS, the break-even is 512.

I am on a dual P3/550 using 2.95.2.  I will tell you exactly why my
break-even is lower than most --- I have assembly language memset()
functions in libc on BSD/OS.

I suggest changing the MEMSET_LOOP_LIMIT to 512.

---------------------------------------------------------------------------

Neil Conway wrote:
> In include/c.h, MemSet() is defined to be different than the stock
> function memset() only when copying less than or equal to
> MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
> definition note:
> 
>  *    We got the 64 number by testing this against the stock memset() on
>  *    BSD/OS 3.0. Larger values were slower.  bjm 1997/09/11
>  *
>  *    I think the crossover point could be a good deal higher for
>  *    most platforms, actually.  tgl 2000-03-19
> 
> I decided to investigate Tom's suggestion and determine the
> performance of MemSet() versus memset() on my machine, for various
> values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
> Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
> and glibc 2.2.5 -- the results may or may not apply to other
> machines.
> 
> The test program was:
> 
> #include <string.h>
> #include "postgres.h"
> 
> #undef MEMSET_LOOP_LIMIT
> #define MEMSET_LOOP_LIMIT BUFFER_SIZE
> 
> int
> main(void)
> {
>       char buffer[BUFFER_SIZE];
>       long long i;
> 
>       for (i = 0; i < 99000000; i++)
>       {
>               MemSet(buffer, 0, sizeof(buffer));
>       }
> 
>       return 0;
> }
> 
> (I manually changed MemSet() to memset() when testing the performance
> of the latter function.)
> 
> It was compiled like so:
> 
>         gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c
> 
> (The -O2 optimization flag is important: the results are significantly
> different if it is not used.)
> 
> Here are the results (each timing is the 'total' listing from 'time
> ./a.out'):
> 
> BUFFER_SIZE = 64
>         MemSet() -> 2.756, 2.810, 2.789
>         memset() -> 13.844, 13.782, 13.778
> 
> BUFFER_SIZE = 128
>         MemSet() -> 5.848, 5.989, 5.861
>         memset() -> 15.637, 15.631, 15.631
> 
> BUFFER_SIZE = 256
>         MemSet() -> 9.602, 9.652, 9.633
>         memset() -> 19.305, 19.370, 19.302
> 
> BUFFER_SIZE = 512
>         MemSet() -> 17.416, 17.462, 17.353
>         memset() -> 26.657, 26.658, 26.678
> 
> BUFFER_SIZE = 1024
>         MemSet() -> 32.144, 32.179, 32.086
>         memset() -> 41.186, 41.115, 41.176
> 
> BUFFER_SIZE = 2048
>         MemSet() -> 60.39, 60.48, 60.32
>         memset() -> 71.19, 71.18, 71.17
> 
> BUFFER_SIZE = 4096
>         MemSet() -> 118.29, 120.07, 118.69
>         memset() -> 131.40, 131.41
> 
> ... at which point I stopped benchmarking.
> 
> Is the benchmark above a reasonable assessment of memset() / MemSet()
> performance when copying word-aligned amounts of memory? If so, what's
> a good value for MEMSET_LOOP_LIMIT (perhaps 512)?
> 
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.
> 
> Cheers,
> 
> Neil
> 
> -- 
> Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to