On Mon, Apr 23, 2001 at 11:34:35PM +0200, Andrea Arcangeli wrote:
> On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote:
> > This patch (made against linux-2.4.4-pre6) makes a number of changes to the
> > rwsem implementation:
> >
> > (1) Everything in try #2
> >
> > plus
> >
> > (2) Changes proposed by Linus for the generic semaphore code.
> >
> > (3) Ideas from Andrea and how he implemented his semaphores.
>
> I benchmarked try3 on top of pre6 and I get this:
>
> ----------------------------------------------
> RWSEM_GENERIC_SPINLOCK y in rwsem-2.4.4-pre6 + your latest #try3
>
> rw
>
> reads taken: 5842496
> writes taken: 3016649
> reads taken: 5823381
> writes taken: 3006773
>
> r1
>
> reads taken: 13309316
> reads taken: 13311722
>
> r2
>
> reads taken: 5010534
> reads taken: 5023185
>
> ro
>
> reads taken: 3850228
> reads taken: 3845954
>
> w1
>
> writes taken: 13012701
> writes taken: 13021716
>
> wo
>
> writes taken: 1825789
> writes taken: 1802560
>
> ----------------------------------------------
> RWSEM_XCHGADD y in rwsem-2.4.4-pre6 + your latest #try3
>
> rw
>
> reads taken: 5789542
> writes taken: 2989478
> reads taken: 5801777
> writes taken: 2995669
>
> r1
>
> reads taken: 16922653
> reads taken: 16946132
>
> r2
>
> reads taken: 5650211
> reads taken: 5647272
>
> ro
>
> reads taken: 4956250
> reads taken: 4959828
>
> w1
>
> writes taken: 15431139
> writes taken: 15439790
>
> wo
>
> writes taken: 813756
> writes taken: 816005
>
> graph updated attached. so in short my fast path is still quite faster (r1/w1),
> slow path is comparable now (I still win in all tests but wo which is probably
> the less interesting one in real life [write contention]). I still have room to
> improve the wo test [write contention] by spending more icache btw but it
> probably doesn't worth.
Ok I finished now my asm optimized rwsemaphores and I improved a little my
spinlock based one but without touching the icache usage.
Here it is the code against pre6 vanilla:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre6/rwsem-8
here the same but against David's try#2:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre6/rwsem-8-against-dh-try2
and here again the same but against David's latest try#3:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre6/rwsem-8-against-dh-try3
The main advantage of my rewrite are:
- my x86 version is visibly faster than yours in the write fast path and it
also saves icache because it's smaller (read fast path is reproducibly a bit
faster too)
- my spinlock version is visibly faster in both read and write fast path and
still
has smaller icache footprint (of course my version is completly out of line too
and I'm comparing apples to apples)
- at least to me the slow path of my code seems to be much simpler
than yours (and I can actually understand it ;), for example I don't feel
any need of any atomic exchange anywhere, I admit now comparing
my code to yours I still don't know why you need it
- the common code automatically extends itself to support 2^32 concurrent
sleepers
on 64bit archs
- there is no code duplication at all in supporting xchgadd common code logic
for other archs (and I prepared a skeleton to fill for the alpha)
Only disavantage is that sparc64 will have to fixup some bit again.
Here the numbers of the above patches on top of vanilla 2.4.4-pre6:
----------------------------------------------
RWSEM_XCHGADD y in rwsem-2.4.4-pre6 + my latest rwsem
rw
reads taken: 5736978
writes taken: 2962325
reads taken: 5799163
writes taken: 2994404
r1
reads taken: 17044842
reads taken: 17053405
r2
reads taken: 5603085
reads taken: 5601647
ro
reads taken: 4831655
reads taken: 4833518
w1
writes taken: 16064773
writes taken: 16037018
wo
writes taken: 860791
writes taken: 864103
----------------------------------------------
RWSEM_SPINLOCK y in rwsem-2.4.4-pre6 + my latest rwsem
rw
reads taken: 6061713
writes taken: 3129801
reads taken: 6099046
writes taken: 3148951
r1
reads taken: 14251500
reads taken: 14265389
r2
reads taken: 4972932
reads taken: 4936267
ro
reads taken: 4253814
reads taken: 4253432
w1
writes taken: 13652385
writes taken: 13632914
wo
writes taken: 1751857
writes taken: 1753608
I draw a graph with the above numbers compared to the previous quoted numbers I
generated a few hours ago with your latest try#3 (attached). (W1 and R1 are
respectively the benchmark of the write fast path and read fast path, only one
writer and only one reader and they're of course the most interesting ones)
So I'd suggest Linus to apply one of my above -8 patches for pre7. (I hope I
won't need any secondary try)
this is a diffstat of the rwsem-8 patch:
arch/alpha/config.in | 1
arch/alpha/kernel/alpha_ksyms.c | 4
arch/arm/config.in | 2
arch/cris/config.in | 1
arch/i386/config.in | 4
arch/ia64/config.in | 1
arch/m68k/config.in | 1
arch/mips/config.in | 1
arch/mips64/config.in | 1
arch/parisc/config.in | 1
arch/ppc/config.in | 1
arch/ppc/kernel/ppc_ksyms.c | 2
arch/s390/config.in | 1
arch/s390x/config.in | 1
arch/sh/config.in | 1
arch/sparc/config.in | 1
arch/sparc64/config.in | 4
include/asm-alpha/compiler.h | 9 -
include/asm-alpha/rwsem_xchgadd.h | 27 +++
include/asm-alpha/semaphore.h | 2
include/asm-i386/rwsem.h | 225 --------------------------------
include/asm-i386/rwsem_xchgadd.h | 93 +++++++++++++
include/asm-sparc64/rwsem.h | 35 ++---
include/linux/compiler.h | 13 +
include/linux/rwsem-spinlock.h | 57 --------
include/linux/rwsem.h | 106 ---------------
include/linux/rwsem_spinlock.h | 61 ++++++++
include/linux/rwsem_xchgadd.h | 96 +++++++++++++
include/linux/sched.h | 2
lib/Makefile | 6
lib/rwsem-spinlock.c | 245 -----------------------------------
lib/rwsem.c | 265 --------------------------------------
lib/rwsem_spinlock.c | 124 +++++++++++++++++
lib/rwsem_xchgadd.c | 88 ++++++++++++
35 files changed, 531 insertions(+), 951 deletions(-)
Andrea
rwsem-8.png