Would you please retest this. I have attached my email showing a simpler test that is less error-prone.
I can't come up with any scenario that would produce what you have reported. If I look at function call cost, MemSet loop efficiency, and memset loop efficiency, I can't come up with a combination that produces what you reported. The standard assumption is that function call overhead is significant, and that memset it faster than C MemSet. What compiler are you using? Is the memset() call being inlined by the compiler? You will have to look at the assembler code to be sure. My only guess is that memset is inlined and that it is only moving single bytes. If that is the case, there is no function call overhead and it would explain why MemSet gets faster as the buffer gets larger. --------------------------------------------------------------------------- Andrew Sullivan wrote: > On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote: > > > > Also, if anyone would like to contribute the results of doing the > > benchmark on their particular system, that might provide some useful > > additional data points. > > Ok, here's a run on a Sun E450, Solaris 7. I presume your "total" > time label corresponds to my "real" time. That's what I'm including, > anyway. > > System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2 > X UltraSPARC-II 400MHz) > System clock frequency: 100 MHz > Memory size: 2560 Megabytes > > BUFFER_SIZE = 64 > MemSet(): 0m13.343s,12.567s,13.659s > memset(): 0m1.255s,0m1.258s,0m1.254s > > BUFFER_SIZE = 128 > MemSet(): 0m21.347s,0m21.200s,0m20.541s > memset(): 0m18.041s,0m17.963s,0m17.990s > > BUFFER_SIZE = 256 > MemSet(): 0m38.023s,0m37.480s,0m37.631s > memset(): 0m25.969s,0m26.047s,0m26.012s > > BUFFER_SIZE = 512 > MemSet(): 1m9.226s,1m9.901s,1m10.148s > memset(): 2m17.897s,2m18.310s,2m17.984s > > BUFFER_SIZE = 1024 > MemSet(): 2m13.690s,2m13.981s,2m13.206s > memset(): 4m43.195s,4m43.405s,4m43.390s > > . . .at which point I gave up. > > A > > -- > ---- > Andrew Sullivan 204-4141 Yonge Street > Liberty RMS Toronto, Ontario Canada > <[EMAIL PROTECTED]> M2P 2A8 > +1 416 646 3304 x110 > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>From [EMAIL PROTECTED] Thu Aug 29 15:39:08 2002 Return-path: <[EMAIL PROTECTED]> Received: from postgresql.org (postgresql.org [64.49.215.8]) by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g7TJd7t20265 for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:39:07 -0400 (EDT) Received: from localhost (postgresql.org [64.49.215.8]) by postgresql.org (Postfix) with ESMTP id 2144E4767A4; Thu, 29 Aug 2002 15:37:42 -0400 (EDT) Received: from postgresql.org (postgresql.org [64.49.215.8]) by postgresql.org (Postfix) with SMTP id A7FDC476705; Thu, 29 Aug 2002 15:37:40 -0400 (EDT) Received: from localhost (postgresql.org [64.49.215.8]) by postgresql.org (Postfix) with ESMTP id BD1824759F2 for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:37:34 -0400 (EDT) Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) by postgresql.org (Postfix) with ESMTP id F2FE34759BD for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:37:29 -0400 (EDT) Received: (from pgman@localhost) by candle.pha.pa.us (8.11.6/8.10.1) id g7TJbQC20180; Thu, 29 Aug 2002 15:37:26 -0400 (EDT) From: Bruce Momjian <[EMAIL PROTECTED]> Message-ID: <[EMAIL PROTECTED]> Subject: Re: [HACKERS] tweaking MemSet() performance In-Reply-To: <[EMAIL PROTECTED]> To: Neil Conway <[EMAIL PROTECTED]> Date: Thu, 29 Aug 2002 15:37:26 -0400 (EDT) cc: PostgreSQL Hackers <[EMAIL PROTECTED]> X-Mailer: ELM [version 2.4ME+ PL99 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-Virus-Scanned: by AMaViS new-20020517 Precedence: bulk Sender: [EMAIL PROTECTED] X-Virus-Scanned: by AMaViS new-20020517 Status: OR I consider this a very good test. As you can see from the date of my last test, 1997/09/11, I think I may have had a dual Pentium Pro at that point, and hardware has certainly changed since then. I did try 128 at that time and found it to be slower, but with newer hardware, it is very possible it has improved. I remember in writing that macro how surprised I was that there was any improvements, but obviously there is a gain and the gain is getting bigger. I tested the following program: #include <string.h> #include "postgres.h" #undef MEMSET_LOOP_LIMIT #define MEMSET_LOOP_LIMIT 1000000 int main(int argc, char **argv) { int len = atoi(argv[1]); char buffer[len]; long long i; for (i = 0; i < 9900000; i++) MemSet(buffer, 0, len); return 0; } and, yes, -O2 is significant! Looks like we use -O2 on all platforms that use GCC so we should be OK there. I tested with the following script: for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c"; time tst1 $TIME; done and got for MemSet: *64 real 0m1.001s user 0m1.000s sys 0m0.003s *128 real 0m1.578s user 0m1.567s sys 0m0.013s *256 real 0m2.723s user 0m2.723s sys 0m0.003s *512 real 0m5.044s user 0m5.029s sys 0m0.013s *1024 real 0m9.621s user 0m9.621s sys 0m0.003s *2048 real 0m18.821s user 0m18.811s sys 0m0.013s *4096 real 0m37.266s user 0m37.266s sys 0m0.003s and for memset(): *64 real 0m1.813s user 0m1.801s sys 0m0.014s *128 real 0m2.489s user 0m2.499s sys 0m0.994s *256 real 0m4.397s user 0m5.389s sys 0m0.005s *512 real 0m5.186s user 0m6.170s sys 0m0.015s *1024 real 0m6.676s user 0m6.676s sys 0m0.003s *2048 real 0m9.766s user 0m9.776s sys 0m0.994s *4096 real 0m15.970s user 0m15.954s sys 0m0.003s so for BSD/OS, the break-even is 512. I am on a dual P3/550 using 2.95.2. I will tell you exactly why my break-even is lower than most --- I have assembly language memset() functions in libc on BSD/OS. I suggest changing the MEMSET_LOOP_LIMIT to 512. --------------------------------------------------------------------------- Neil Conway wrote: > In include/c.h, MemSet() is defined to be different than the stock > function memset() only when copying less than or equal to > MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro > definition note: > > * We got the 64 number by testing this against the stock memset() on > * BSD/OS 3.0. Larger values were slower. bjm 1997/09/11 > * > * I think the crossover point could be a good deal higher for > * most platforms, actually. tgl 2000-03-19 > > I decided to investigate Tom's suggestion and determine the > performance of MemSet() versus memset() on my machine, for various > values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a > Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1 > and glibc 2.2.5 -- the results may or may not apply to other > machines. > > The test program was: > > #include <string.h> > #include "postgres.h" > > #undef MEMSET_LOOP_LIMIT > #define MEMSET_LOOP_LIMIT BUFFER_SIZE > > int > main(void) > { > char buffer[BUFFER_SIZE]; > long long i; > > for (i = 0; i < 99000000; i++) > { > MemSet(buffer, 0, sizeof(buffer)); > } > > return 0; > } > > (I manually changed MemSet() to memset() when testing the performance > of the latter function.) > > It was compiled like so: > > gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c > > (The -O2 optimization flag is important: the results are significantly > different if it is not used.) > > Here are the results (each timing is the 'total' listing from 'time > ./a.out'): > > BUFFER_SIZE = 64 > MemSet() -> 2.756, 2.810, 2.789 > memset() -> 13.844, 13.782, 13.778 > > BUFFER_SIZE = 128 > MemSet() -> 5.848, 5.989, 5.861 > memset() -> 15.637, 15.631, 15.631 > > BUFFER_SIZE = 256 > MemSet() -> 9.602, 9.652, 9.633 > memset() -> 19.305, 19.370, 19.302 > > BUFFER_SIZE = 512 > MemSet() -> 17.416, 17.462, 17.353 > memset() -> 26.657, 26.658, 26.678 > > BUFFER_SIZE = 1024 > MemSet() -> 32.144, 32.179, 32.086 > memset() -> 41.186, 41.115, 41.176 > > BUFFER_SIZE = 2048 > MemSet() -> 60.39, 60.48, 60.32 > memset() -> 71.19, 71.18, 71.17 > > BUFFER_SIZE = 4096 > MemSet() -> 118.29, 120.07, 118.69 > memset() -> 131.40, 131.41 > > ... at which point I stopped benchmarking. > > Is the benchmark above a reasonable assessment of memset() / MemSet() > performance when copying word-aligned amounts of memory? If so, what's > a good value for MEMSET_LOOP_LIMIT (perhaps 512)? > > Also, if anyone would like to contribute the results of doing the > benchmark on their particular system, that might provide some useful > additional data points. > > Cheers, > > Neil > > -- > Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])