rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

Andrea Arcangeli Mon, 23 Apr 2001 21:40:03 -0700
On Mon, Apr 23, 2001 at 11:34:35PM +0200, Andrea Arcangeli wrote:
> On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote:
> > This patch (made against linux-2.4.4-pre6) makes a number of changes to the
> > rwsem implementation:
> > 
> >  (1) Everything in try #2
> > 
> > plus
> > 
> >  (2) Changes proposed by Linus for the generic semaphore code.
> > 
> >  (3) Ideas from Andrea and how he implemented his semaphores.
> 
> I benchmarked try3 on top of pre6 and I get this:
> 
> ----------------------------------------------
> RWSEM_GENERIC_SPINLOCK y in rwsem-2.4.4-pre6 + your latest #try3
> 
> rw
> 
> reads taken: 5842496
> writes taken: 3016649
> reads taken: 5823381
> writes taken: 3006773
> 
> r1
> 
> reads taken: 13309316
> reads taken: 13311722
> 
> r2
> 
> reads taken: 5010534
> reads taken: 5023185
> 
> ro
> 
> reads taken: 3850228
> reads taken: 3845954
> 
> w1
> 
> writes taken: 13012701
> writes taken: 13021716
> 
> wo
> 
> writes taken: 1825789
> writes taken: 1802560
> 
> ----------------------------------------------
> RWSEM_XCHGADD y in rwsem-2.4.4-pre6 + your latest #try3
> 
> rw
> 
> reads taken: 5789542
> writes taken: 2989478
> reads taken: 5801777
> writes taken: 2995669
> 
> r1
> 
> reads taken: 16922653
> reads taken: 16946132
> 
> r2
> 
> reads taken: 5650211
> reads taken: 5647272
> 
> ro
> 
> reads taken: 4956250
> reads taken: 4959828
> 
> w1
> 
> writes taken: 15431139
> writes taken: 15439790
> 
> wo
> 
> writes taken: 813756
> writes taken: 816005
> 
> graph updated attached. so in short my fast path is still quite faster (r1/w1),
> slow path is comparable now (I still win in all tests but wo which is probably
> the less interesting one in real life [write contention]). I still have room to
> improve the wo test [write contention] by spending more icache btw but it
> probably doesn't worth.

Ok I finished now my asm optimized rwsemaphores and I improved a little my
spinlock based one but without touching the icache usage.

Here it is the code against pre6 vanilla:

        
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre6/rwsem-8

here the same but against David's try#2:

        
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre6/rwsem-8-against-dh-try2

and here again the same but against David's latest try#3:

        
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre6/rwsem-8-against-dh-try3

The main advantage of my rewrite are:

-       my x86 version is visibly faster than yours in the write fast path and it
        also saves icache because it's smaller (read fast path is reproducibly a bit 
faster too)

-       my spinlock version is visibly faster in both read and write fast path and 
still
        has smaller icache footprint (of course my version is completly out of line too
        and I'm comparing apples to apples)

-       at least to me the slow path of my code seems to be much simpler
        than yours (and I can actually understand it ;), for example I don't feel
        any need of any atomic exchange anywhere, I admit now comparing
        my code to yours I still don't know why you need it

-       the common code automatically extends itself to support 2^32 concurrent 
sleepers
        on 64bit archs

-       there is no code duplication at all in supporting xchgadd common code logic
        for other archs (and I prepared a skeleton to fill for the alpha)

Only disavantage is that sparc64 will have to fixup some bit again.

Here the numbers of the above patches on top of vanilla 2.4.4-pre6:

----------------------------------------------
RWSEM_XCHGADD y in rwsem-2.4.4-pre6 + my latest rwsem

rw

reads taken: 5736978
writes taken: 2962325
reads taken: 5799163
writes taken: 2994404

r1

reads taken: 17044842
reads taken: 17053405

r2

reads taken: 5603085
reads taken: 5601647

ro

reads taken: 4831655
reads taken: 4833518

w1

writes taken: 16064773
writes taken: 16037018

wo

writes taken: 860791
writes taken: 864103

----------------------------------------------
RWSEM_SPINLOCK y in rwsem-2.4.4-pre6 + my latest rwsem

rw

reads taken: 6061713
writes taken: 3129801
reads taken: 6099046
writes taken: 3148951

r1

reads taken: 14251500
reads taken: 14265389

r2

reads taken: 4972932
reads taken: 4936267

ro

reads taken: 4253814
reads taken: 4253432

w1

writes taken: 13652385
writes taken: 13632914

wo

writes taken: 1751857
writes taken: 1753608

I draw a graph with the above numbers compared to the previous quoted numbers I
generated a few hours ago with your latest try#3 (attached). (W1 and R1 are
respectively the benchmark of the write fast path and read fast path, only one
writer and only one reader and they're of course the most interesting ones)

So I'd suggest Linus to apply one of my above -8 patches for pre7. (I hope I
won't need any secondary try)

this is a diffstat of the rwsem-8 patch:

 arch/alpha/config.in              |    1 
 arch/alpha/kernel/alpha_ksyms.c   |    4 
 arch/arm/config.in                |    2 
 arch/cris/config.in               |    1 
 arch/i386/config.in               |    4 
 arch/ia64/config.in               |    1 
 arch/m68k/config.in               |    1 
 arch/mips/config.in               |    1 
 arch/mips64/config.in             |    1 
 arch/parisc/config.in             |    1 
 arch/ppc/config.in                |    1 
 arch/ppc/kernel/ppc_ksyms.c       |    2 
 arch/s390/config.in               |    1 
 arch/s390x/config.in              |    1 
 arch/sh/config.in                 |    1 
 arch/sparc/config.in              |    1 
 arch/sparc64/config.in            |    4 
 include/asm-alpha/compiler.h      |    9 -
 include/asm-alpha/rwsem_xchgadd.h |   27 +++
 include/asm-alpha/semaphore.h     |    2 
 include/asm-i386/rwsem.h          |  225 --------------------------------
 include/asm-i386/rwsem_xchgadd.h  |   93 +++++++++++++
 include/asm-sparc64/rwsem.h       |   35 ++---
 include/linux/compiler.h          |   13 +
 include/linux/rwsem-spinlock.h    |   57 --------
 include/linux/rwsem.h             |  106 ---------------
 include/linux/rwsem_spinlock.h    |   61 ++++++++
 include/linux/rwsem_xchgadd.h     |   96 +++++++++++++
 include/linux/sched.h             |    2 
 lib/Makefile                      |    6 
 lib/rwsem-spinlock.c              |  245 -----------------------------------
 lib/rwsem.c                       |  265 --------------------------------------
 lib/rwsem_spinlock.c              |  124 +++++++++++++++++
 lib/rwsem_xchgadd.c               |   88 ++++++++++++
 35 files changed, 531 insertions(+), 951 deletions(-)

Andrea
rwsem-8.png
rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

Reply via email to