Would you please retest this.  I have attached my email showing a
simpler test that is less error-prone.

I can't come up with any scenario that would produce what you have
reported.  If I look at function call cost, MemSet loop efficiency, and
memset loop efficiency, I can't come up with a combination that produces
what you reported.

The standard assumption is that function call overhead is significant,
and that memset it faster than C MemSet.  What compiler are you using? 
Is the memset() call being inlined by the compiler?  You will have to
look at the assembler code to be sure.

My only guess is that memset is inlined and that it is only moving
single bytes.  If that is the case, there is no function call overhead
and it would explain why MemSet gets faster as the buffer gets larger.

---------------------------------------------------------------------------

Andrew Sullivan wrote:
> On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote:
> > 
> > Also, if anyone would like to contribute the results of doing the
> > benchmark on their particular system, that might provide some useful
> > additional data points.
> 
> Ok, here's a run on a Sun E450, Solaris 7.  I presume your "total"
> time label corresponds to my "real" time.  That's what I'm including,
> anyway.
> 
> System Configuration:  Sun Microsystems  sun4u Sun Enterprise 450 (2
> X UltraSPARC-II 400MHz)
> System clock frequency: 100 MHz
> Memory size: 2560 Megabytes
> 
> BUFFER_SIZE = 64
>         MemSet(): 0m13.343s,12.567s,13.659s
>         memset(): 0m1.255s,0m1.258s,0m1.254s
>         
> BUFFER_SIZE = 128
>         MemSet(): 0m21.347s,0m21.200s,0m20.541s
>         memset(): 0m18.041s,0m17.963s,0m17.990s
>         
> BUFFER_SIZE = 256
>         MemSet(): 0m38.023s,0m37.480s,0m37.631s
>         memset(): 0m25.969s,0m26.047s,0m26.012s
>         
> BUFFER_SIZE = 512
>         MemSet(): 1m9.226s,1m9.901s,1m10.148s
>         memset(): 2m17.897s,2m18.310s,2m17.984s
> 
> BUFFER_SIZE = 1024
>         MemSet(): 2m13.690s,2m13.981s,2m13.206s
>         memset(): 4m43.195s,4m43.405s,4m43.390s
> 
> . . .at which point I gave up.
> 
> A
> 
> -- 
> ----
> Andrew Sullivan                         204-4141 Yonge Street
> Liberty RMS                           Toronto, Ontario Canada
> <[EMAIL PROTECTED]>                              M2P 2A8
>                                          +1 416 646 3304 x110
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>From [EMAIL PROTECTED] Thu Aug 29 15:39:08 2002
Return-path: <[EMAIL PROTECTED]>
Received: from postgresql.org (postgresql.org [64.49.215.8])
        by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g7TJd7t20265
        for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:39:07 -0400 (EDT)
Received: from localhost (postgresql.org [64.49.215.8])
        by postgresql.org (Postfix) with ESMTP
        id 2144E4767A4; Thu, 29 Aug 2002 15:37:42 -0400 (EDT)
Received: from postgresql.org (postgresql.org [64.49.215.8])
        by postgresql.org (Postfix) with SMTP
        id A7FDC476705; Thu, 29 Aug 2002 15:37:40 -0400 (EDT)
Received: from localhost (postgresql.org [64.49.215.8])
        by postgresql.org (Postfix) with ESMTP id BD1824759F2
        for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:37:34 -0400 (EDT)
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
        by postgresql.org (Postfix) with ESMTP id F2FE34759BD
        for <[EMAIL PROTECTED]>; Thu, 29 Aug 2002 15:37:29 -0400 (EDT)
Received: (from pgman@localhost)
        by candle.pha.pa.us (8.11.6/8.10.1) id g7TJbQC20180;
        Thu, 29 Aug 2002 15:37:26 -0400 (EDT)
From: Bruce Momjian <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
Subject: Re: [HACKERS] tweaking MemSet() performance
In-Reply-To: <[EMAIL PROTECTED]>
To: Neil Conway <[EMAIL PROTECTED]>
Date: Thu, 29 Aug 2002 15:37:26 -0400 (EDT)
cc: PostgreSQL Hackers <[EMAIL PROTECTED]>
X-Mailer: ELM [version 2.4ME+ PL99 (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
X-Virus-Scanned: by AMaViS new-20020517
Precedence: bulk
Sender: [EMAIL PROTECTED]
X-Virus-Scanned: by AMaViS new-20020517
Status: OR


I consider this a very good test.  As you can see from the date of my
last test, 1997/09/11, I think I may have had a dual Pentium Pro at that
point, and hardware has certainly changed since then.  I did try 128 at
that time and found it to be slower, but with newer hardware, it is very
possible it has improved.

I remember in writing that macro how surprised I was that there was any
improvements, but obviously there is a gain and the gain is getting
bigger.

I tested the following program:
                
        #include <string.h>
        #include "postgres.h"
        
        #undef  MEMSET_LOOP_LIMIT
        #define MEMSET_LOOP_LIMIT  1000000
        
        int
        main(int argc, char **argv)
        {
                int             len = atoi(argv[1]);
                char            buffer[len];
                long long       i;
        
                for (i = 0; i < 9900000; i++)
                        MemSet(buffer, 0, len);
                return 0;
        }

and, yes, -O2 is significant!  Looks like we use -O2 on all platforms
that use GCC so we should be OK there.

I tested with the following script:

        for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c";
        time tst1 $TIME; done

and got for MemSet:
        
        *64
        real    0m1.001s
        user    0m1.000s
        sys     0m0.003s
        *128
        real    0m1.578s
        user    0m1.567s
        sys     0m0.013s
        *256
        real    0m2.723s
        user    0m2.723s
        sys     0m0.003s
        *512
        real    0m5.044s
        user    0m5.029s
        sys     0m0.013s
        *1024
        real    0m9.621s
        user    0m9.621s
        sys     0m0.003s
        *2048
        real    0m18.821s
        user    0m18.811s
        sys     0m0.013s
        *4096
        real    0m37.266s
        user    0m37.266s
        sys     0m0.003s

and for memset():
        
        *64
        real    0m1.813s
        user    0m1.801s
        sys     0m0.014s
        *128
        real    0m2.489s
        user    0m2.499s
        sys     0m0.994s
        *256
        real    0m4.397s
        user    0m5.389s
        sys     0m0.005s
        *512
        real    0m5.186s
        user    0m6.170s
        sys     0m0.015s
        *1024
        real    0m6.676s
        user    0m6.676s
        sys     0m0.003s
        *2048
        real    0m9.766s
        user    0m9.776s
        sys     0m0.994s
        *4096
        real    0m15.970s
        user    0m15.954s
        sys     0m0.003s

so for BSD/OS, the break-even is 512.

I am on a dual P3/550 using 2.95.2.  I will tell you exactly why my
break-even is lower than most --- I have assembly language memset()
functions in libc on BSD/OS.

I suggest changing the MEMSET_LOOP_LIMIT to 512.

---------------------------------------------------------------------------

Neil Conway wrote:
> In include/c.h, MemSet() is defined to be different than the stock
> function memset() only when copying less than or equal to
> MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
> definition note:
> 
>  *    We got the 64 number by testing this against the stock memset() on
>  *    BSD/OS 3.0. Larger values were slower.  bjm 1997/09/11
>  *
>  *    I think the crossover point could be a good deal higher for
>  *    most platforms, actually.  tgl 2000-03-19
> 
> I decided to investigate Tom's suggestion and determine the
> performance of MemSet() versus memset() on my machine, for various
> values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
> Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
> and glibc 2.2.5 -- the results may or may not apply to other
> machines.
> 
> The test program was:
> 
> #include <string.h>
> #include "postgres.h"
> 
> #undef MEMSET_LOOP_LIMIT
> #define MEMSET_LOOP_LIMIT BUFFER_SIZE
> 
> int
> main(void)
> {
>       char buffer[BUFFER_SIZE];
>       long long i;
> 
>       for (i = 0; i < 99000000; i++)
>       {
>               MemSet(buffer, 0, sizeof(buffer));
>       }
> 
>       return 0;
> }
> 
> (I manually changed MemSet() to memset() when testing the performance
> of the latter function.)
> 
> It was compiled like so:
> 
>         gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c
> 
> (The -O2 optimization flag is important: the results are significantly
> different if it is not used.)
> 
> Here are the results (each timing is the 'total' listing from 'time
> ./a.out'):
> 
> BUFFER_SIZE = 64
>         MemSet() -> 2.756, 2.810, 2.789
>         memset() -> 13.844, 13.782, 13.778
> 
> BUFFER_SIZE = 128
>         MemSet() -> 5.848, 5.989, 5.861
>         memset() -> 15.637, 15.631, 15.631
> 
> BUFFER_SIZE = 256
>         MemSet() -> 9.602, 9.652, 9.633
>         memset() -> 19.305, 19.370, 19.302
> 
> BUFFER_SIZE = 512
>         MemSet() -> 17.416, 17.462, 17.353
>         memset() -> 26.657, 26.658, 26.678
> 
> BUFFER_SIZE = 1024
>         MemSet() -> 32.144, 32.179, 32.086
>         memset() -> 41.186, 41.115, 41.176
> 
> BUFFER_SIZE = 2048
>         MemSet() -> 60.39, 60.48, 60.32
>         memset() -> 71.19, 71.18, 71.17
> 
> BUFFER_SIZE = 4096
>         MemSet() -> 118.29, 120.07, 118.69
>         memset() -> 131.40, 131.41
> 
> ... at which point I stopped benchmarking.
> 
> Is the benchmark above a reasonable assessment of memset() / MemSet()
> performance when copying word-aligned amounts of memory? If so, what's
> a good value for MEMSET_LOOP_LIMIT (perhaps 512)?
> 
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.
> 
> Cheers,
> 
> Neil
> 
> -- 
> Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to