Would you please retest this. I have attached my email showing a
simpler test that is less error-prone.
I can't come up with any scenario that would produce what you have
reported. If I look at function call cost, MemSet loop efficiency, and
memset loop efficiency, I can't come up with a combination that produces
what you reported.
The standard assumption is that function call overhead is significant,
and that memset it faster than C MemSet. What compiler are you using?
Is the memset() call being inlined by the compiler? You will have to
look at the assembler code to be sure.
My only guess is that memset is inlined and that it is only moving
single bytes. If that is the case, there is no function call overhead
and it would explain why MemSet gets faster as the buffer gets larger.
---------------------------------------------------------------------------
Andrew Sullivan wrote:
> On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote:
> >
> > Also, if anyone would like to contribute the results of doing the
> > benchmark on their particular system, that might provide some useful
> > additional data points.
>
> Ok, here's a run on a Sun E450, Solaris 7. I presume your "total"
> time label corresponds to my "real" time. That's what I'm including,
> anyway.
>
> System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2
> X UltraSPARC-II 400MHz)
> System clock frequency: 100 MHz
> Memory size: 2560 Megabytes
>
> BUFFER_SIZE = 64
> MemSet(): 0m13.343s,12.567s,13.659s
> memset(): 0m1.255s,0m1.258s,0m1.254s
>
> BUFFER_SIZE = 128
> MemSet(): 0m21.347s,0m21.200s,0m20.541s
> memset(): 0m18.041s,0m17.963s,0m17.990s
>
> BUFFER_SIZE = 256
> MemSet(): 0m38.023s,0m37.480s,0m37.631s
> memset(): 0m25.969s,0m26.047s,0m26.012s
>
> BUFFER_SIZE = 512
> MemSet(): 1m9.226s,1m9.901s,1m10.148s
> memset(): 2m17.897s,2m18.310s,2m17.984s
>
> BUFFER_SIZE = 1024
> MemSet(): 2m13.690s,2m13.981s,2m13.206s
> memset(): 4m43.195s,4m43.405s,4m43.390s
>
> . . .at which point I gave up.
>
> A
>
> --
> ----
> Andrew Sullivan 204-4141 Yonge Street
> Liberty RMS Toronto, Ontario Canada
> <[EMAIL PROTECTED]> M2P 2A8
> +1 416 646 3304 x110
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>
--
Bruce Momjian | http://candle.pha.pa.us
[EMAIL PROTECTED] | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
>From [EMAIL PROTECTED] Thu Aug 29 15:39:08 2002
Return-path: <[EMAIL PROTECTED]>
Received: from postgresql.org (postgresql.org [64.49.215.8])
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g7TJd7t20265
for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:39:07 -0400 (EDT)
Received: from localhost (postgresql.org [64.49.215.8])
by postgresql.org (Postfix) with ESMTP
id 2144E4767A4; Thu, 29 Aug 2002 15:37:42 -0400 (EDT)
Received: from postgresql.org (postgresql.org [64.49.215.8])
by postgresql.org (Postfix) with SMTP
id A7FDC476705; Thu, 29 Aug 2002 15:37:40 -0400 (EDT)
Received: from localhost (postgresql.org [64.49.215.8])
by postgresql.org (Postfix) with ESMTP id BD1824759F2
for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:37:34 -0400 (EDT)
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
by postgresql.org (Postfix) with ESMTP id F2FE34759BD
for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:37:29 -0400 (EDT)
Received: (from pgman@localhost)
by candle.pha.pa.us (8.11.6/8.10.1) id g7TJbQC20180;
Thu, 29 Aug 2002 15:37:26 -0400 (EDT)
From: Bruce Momjian <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
Subject: Re: [HACKERS] tweaking MemSet() performance
In-Reply-To: <[EMAIL PROTECTED]>
To: Neil Conway <[EMAIL PROTECTED]>
Date: Thu, 29 Aug 2002 15:37:26 -0400 (EDT)
cc: PostgreSQL Hackers <[EMAIL PROTECTED]>
X-Mailer: ELM [version 2.4ME+ PL99 (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: [EMAIL PROTECTED]
X-Virus-Scanned: by AMaViS new-20020517
Status: OR
I consider this a very good test. As you can see from the date of my
last test, 1997/09/11, I think I may have had a dual Pentium Pro at that
point, and hardware has certainly changed since then. I did try 128 at
that time and found it to be slower, but with newer hardware, it is very
possible it has improved.
I remember in writing that macro how surprised I was that there was any
improvements, but obviously there is a gain and the gain is getting
bigger.
I tested the following program:
#include <string.h>
#include "postgres.h"
#undef MEMSET_LOOP_LIMIT
#define MEMSET_LOOP_LIMIT 1000000
int
main(int argc, char **argv)
{
int len = atoi(argv[1]);
char buffer[len];
long long i;
for (i = 0; i < 9900000; i++)
MemSet(buffer, 0, len);
return 0;
}
and, yes, -O2 is significant! Looks like we use -O2 on all platforms
that use GCC so we should be OK there.
I tested with the following script:
for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c";
time tst1 $TIME; done
and got for MemSet:
*64
real 0m1.001s
user 0m1.000s
sys 0m0.003s
*128
real 0m1.578s
user 0m1.567s
sys 0m0.013s
*256
real 0m2.723s
user 0m2.723s
sys 0m0.003s
*512
real 0m5.044s
user 0m5.029s
sys 0m0.013s
*1024
real 0m9.621s
user 0m9.621s
sys 0m0.003s
*2048
real 0m18.821s
user 0m18.811s
sys 0m0.013s
*4096
real 0m37.266s
user 0m37.266s
sys 0m0.003s
and for memset():
*64
real 0m1.813s
user 0m1.801s
sys 0m0.014s
*128
real 0m2.489s
user 0m2.499s
sys 0m0.994s
*256
real 0m4.397s
user 0m5.389s
sys 0m0.005s
*512
real 0m5.186s
user 0m6.170s
sys 0m0.015s
*1024
real 0m6.676s
user 0m6.676s
sys 0m0.003s
*2048
real 0m9.766s
user 0m9.776s
sys 0m0.994s
*4096
real 0m15.970s
user 0m15.954s
sys 0m0.003s
so for BSD/OS, the break-even is 512.
I am on a dual P3/550 using 2.95.2. I will tell you exactly why my
break-even is lower than most --- I have assembly language memset()
functions in libc on BSD/OS.
I suggest changing the MEMSET_LOOP_LIMIT to 512.
---------------------------------------------------------------------------
Neil Conway wrote:
> In include/c.h, MemSet() is defined to be different than the stock
> function memset() only when copying less than or equal to
> MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
> definition note:
>
> * We got the 64 number by testing this against the stock memset() on
> * BSD/OS 3.0. Larger values were slower. bjm 1997/09/11
> *
> * I think the crossover point could be a good deal higher for
> * most platforms, actually. tgl 2000-03-19
>
> I decided to investigate Tom's suggestion and determine the
> performance of MemSet() versus memset() on my machine, for various
> values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
> Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
> and glibc 2.2.5 -- the results may or may not apply to other
> machines.
>
> The test program was:
>
> #include <string.h>
> #include "postgres.h"
>
> #undef MEMSET_LOOP_LIMIT
> #define MEMSET_LOOP_LIMIT BUFFER_SIZE
>
> int
> main(void)
> {
> char buffer[BUFFER_SIZE];
> long long i;
>
> for (i = 0; i < 99000000; i++)
> {
> MemSet(buffer, 0, sizeof(buffer));
> }
>
> return 0;
> }
>
> (I manually changed MemSet() to memset() when testing the performance
> of the latter function.)
>
> It was compiled like so:
>
> gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c
>
> (The -O2 optimization flag is important: the results are significantly
> different if it is not used.)
>
> Here are the results (each timing is the 'total' listing from 'time
> ./a.out'):
>
> BUFFER_SIZE = 64
> MemSet() -> 2.756, 2.810, 2.789
> memset() -> 13.844, 13.782, 13.778
>
> BUFFER_SIZE = 128
> MemSet() -> 5.848, 5.989, 5.861
> memset() -> 15.637, 15.631, 15.631
>
> BUFFER_SIZE = 256
> MemSet() -> 9.602, 9.652, 9.633
> memset() -> 19.305, 19.370, 19.302
>
> BUFFER_SIZE = 512
> MemSet() -> 17.416, 17.462, 17.353
> memset() -> 26.657, 26.658, 26.678
>
> BUFFER_SIZE = 1024
> MemSet() -> 32.144, 32.179, 32.086
> memset() -> 41.186, 41.115, 41.176
>
> BUFFER_SIZE = 2048
> MemSet() -> 60.39, 60.48, 60.32
> memset() -> 71.19, 71.18, 71.17
>
> BUFFER_SIZE = 4096
> MemSet() -> 118.29, 120.07, 118.69
> memset() -> 131.40, 131.41
>
> ... at which point I stopped benchmarking.
>
> Is the benchmark above a reasonable assessment of memset() / MemSet()
> performance when copying word-aligned amounts of memory? If so, what's
> a good value for MEMSET_LOOP_LIMIT (perhaps 512)?
>
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.
>
> Cheers,
>
> Neil
>
> --
> Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
--
Bruce Momjian | http://candle.pha.pa.us
[EMAIL PROTECTED] | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])