I consider this a very good test. As you can see from the date of my last test, 1997/09/11, I think I may have had a dual Pentium Pro at that point, and hardware has certainly changed since then. I did try 128 at that time and found it to be slower, but with newer hardware, it is very possible it has improved.
I remember in writing that macro how surprised I was that there was any improvements, but obviously there is a gain and the gain is getting bigger. I tested the following program: #include <string.h> #include "postgres.h" #undef MEMSET_LOOP_LIMIT #define MEMSET_LOOP_LIMIT 1000000 int main(int argc, char **argv) { int len = atoi(argv[1]); char buffer[len]; long long i; for (i = 0; i < 9900000; i++) MemSet(buffer, 0, len); return 0; } and, yes, -O2 is significant! Looks like we use -O2 on all platforms that use GCC so we should be OK there. I tested with the following script: for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c"; time tst1 $TIME; done and got for MemSet: *64 real 0m1.001s user 0m1.000s sys 0m0.003s *128 real 0m1.578s user 0m1.567s sys 0m0.013s *256 real 0m2.723s user 0m2.723s sys 0m0.003s *512 real 0m5.044s user 0m5.029s sys 0m0.013s *1024 real 0m9.621s user 0m9.621s sys 0m0.003s *2048 real 0m18.821s user 0m18.811s sys 0m0.013s *4096 real 0m37.266s user 0m37.266s sys 0m0.003s and for memset(): *64 real 0m1.813s user 0m1.801s sys 0m0.014s *128 real 0m2.489s user 0m2.499s sys 0m0.994s *256 real 0m4.397s user 0m5.389s sys 0m0.005s *512 real 0m5.186s user 0m6.170s sys 0m0.015s *1024 real 0m6.676s user 0m6.676s sys 0m0.003s *2048 real 0m9.766s user 0m9.776s sys 0m0.994s *4096 real 0m15.970s user 0m15.954s sys 0m0.003s so for BSD/OS, the break-even is 512. I am on a dual P3/550 using 2.95.2. I will tell you exactly why my break-even is lower than most --- I have assembly language memset() functions in libc on BSD/OS. I suggest changing the MEMSET_LOOP_LIMIT to 512. --------------------------------------------------------------------------- Neil Conway wrote: > In include/c.h, MemSet() is defined to be different than the stock > function memset() only when copying less than or equal to > MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro > definition note: > > * We got the 64 number by testing this against the stock memset() on > * BSD/OS 3.0. Larger values were slower. bjm 1997/09/11 > * > * I think the crossover point could be a good deal higher for > * most platforms, actually. tgl 2000-03-19 > > I decided to investigate Tom's suggestion and determine the > performance of MemSet() versus memset() on my machine, for various > values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a > Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1 > and glibc 2.2.5 -- the results may or may not apply to other > machines. > > The test program was: > > #include <string.h> > #include "postgres.h" > > #undef MEMSET_LOOP_LIMIT > #define MEMSET_LOOP_LIMIT BUFFER_SIZE > > int > main(void) > { > char buffer[BUFFER_SIZE]; > long long i; > > for (i = 0; i < 99000000; i++) > { > MemSet(buffer, 0, sizeof(buffer)); > } > > return 0; > } > > (I manually changed MemSet() to memset() when testing the performance > of the latter function.) > > It was compiled like so: > > gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c > > (The -O2 optimization flag is important: the results are significantly > different if it is not used.) > > Here are the results (each timing is the 'total' listing from 'time > ./a.out'): > > BUFFER_SIZE = 64 > MemSet() -> 2.756, 2.810, 2.789 > memset() -> 13.844, 13.782, 13.778 > > BUFFER_SIZE = 128 > MemSet() -> 5.848, 5.989, 5.861 > memset() -> 15.637, 15.631, 15.631 > > BUFFER_SIZE = 256 > MemSet() -> 9.602, 9.652, 9.633 > memset() -> 19.305, 19.370, 19.302 > > BUFFER_SIZE = 512 > MemSet() -> 17.416, 17.462, 17.353 > memset() -> 26.657, 26.658, 26.678 > > BUFFER_SIZE = 1024 > MemSet() -> 32.144, 32.179, 32.086 > memset() -> 41.186, 41.115, 41.176 > > BUFFER_SIZE = 2048 > MemSet() -> 60.39, 60.48, 60.32 > memset() -> 71.19, 71.18, 71.17 > > BUFFER_SIZE = 4096 > MemSet() -> 118.29, 120.07, 118.69 > memset() -> 131.40, 131.41 > > ... at which point I stopped benchmarking. > > Is the benchmark above a reasonable assessment of memset() / MemSet() > performance when copying word-aligned amounts of memory? If so, what's > a good value for MEMSET_LOOP_LIMIT (perhaps 512)? > > Also, if anyone would like to contribute the results of doing the > benchmark on their particular system, that might provide some useful > additional data points. > > Cheers, > > Neil > > -- > Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]