subject:"\[PATCH\] SLUB use cmpxchg

Re: [PATCH] SLUB use cmpxchg_local

2007-09-04 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > Measurements on IA64 slub w/per cpu vs slub w/per cpu/cmpxchg_local > emulation. Results are not good: > Hi Christoph, I tried to come up with a patch set implementing the basics of a new critical section: local_enter(flags) and local_exit(flags)

Re: [PATCH] SLUB use cmpxchg_local

2007-08-28 Thread Peter Zijlstra

On Tue, 2007-08-28 at 12:36 -0700, Christoph Lameter wrote: > On Tue, 28 Aug 2007, Peter Zijlstra wrote: > > > On Mon, 2007-08-27 at 15:15 -0700, Christoph Lameter wrote: > > > H. One wild idea would be to use a priority futex for the slab lock? > > > That would make the slow paths interrupt

Re: [PATCH] SLUB use cmpxchg_local

2007-08-28 Thread Christoph Lameter

On Tue, 28 Aug 2007, Mathieu Desnoyers wrote: > Ok, I just had a look at ia64 instruction set, and I fear that cmpxchg > must always come with the acquire or release semantic. Is there any > cmpxchg equivalent on ia64 that would be acquire and release semantic > free ? This implicit memory orderin

Re: [PATCH] SLUB use cmpxchg_local

2007-08-28 Thread Christoph Lameter

On Tue, 28 Aug 2007, Peter Zijlstra wrote: > On Mon, 2007-08-27 at 15:15 -0700, Christoph Lameter wrote: > > H. One wild idea would be to use a priority futex for the slab lock? > > That would make the slow paths interrupt safe without requiring interrupt > > disable? Does a futex fit into t

Re: [PATCH] SLUB use cmpxchg_local

2007-08-28 Thread Mathieu Desnoyers

Ok, I just had a look at ia64 instruction set, and I fear that cmpxchg must always come with the acquire or release semantic. Is there any cmpxchg equivalent on ia64 that would be acquire and release semantic free ? This implicit memory ordering in the instruction seems to be responsible for the sl

Re: [PATCH] SLUB use cmpxchg_local

2007-08-28 Thread Peter Zijlstra

On Mon, 2007-08-27 at 15:15 -0700, Christoph Lameter wrote: > H. One wild idea would be to use a priority futex for the slab lock? > That would make the slow paths interrupt safe without requiring interrupt > disable? Does a futex fit into the page struct? Very much puzzled at what you propo

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

Measurements on IA64 slub w/per cpu vs slub w/per cpu/cmpxchg_local emulation. Results are not good: slub/per cpu 1 times kmalloc(8)/kfree -> 105 cycles 1 times kmalloc(16)/kfree -> 104 cycles 1 times kmalloc(32)/kfree -> 105 cycles 1 times kmalloc(64)/kfree -> 104 cycles 1 ti

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

On Mon, 27 Aug 2007, Mathieu Desnoyers wrote: > Hrm, I just want to certify one thing: A lot of code paths seems to go > to the slow path without requiring cmpxchg_local to execute at all. So > is the slow path more likely to be triggered by the (!object), > (!node_match) tests or by these same te

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Mon, 27 Aug 2007, Mathieu Desnoyers wrote: > > > > The slow path would require disable preemption and two interrupt disables. > > If the slow path have to call new_slab, then yes. But it seems that not > > every slow path must call it, so for the

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

H. One wild idea would be to use a priority futex for the slab lock? That would make the slow paths interrupt safe without requiring interrupt disable? Does a futex fit into the page struct? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

On Mon, 27 Aug 2007, Mathieu Desnoyers wrote: > > The slow path would require disable preemption and two interrupt disables. > If the slow path have to call new_slab, then yes. But it seems that not > every slow path must call it, so for the other slow paths, only one > interrupt disable would be

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Mon, 27 Aug 2007, Mathieu Desnoyers wrote: > > > > a clean solution source code wise. It also minimizes the interrupt > > > holdoff > > > for the non-cmpxchg_local arches. However, it means that we will have to > > > disable interrupts twice f

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

On Mon, 27 Aug 2007, Mathieu Desnoyers wrote: > > a clean solution source code wise. It also minimizes the interrupt holdoff > > for the non-cmpxchg_local arches. However, it means that we will have to > > disable interrupts twice for the slow path. If that is too expensive then > > we need a d

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > I think the simplest solution may be to leave slub as done in the patch > that we developed last week. The arch must provide a cmpxchg_local that is > performance wise the fastest possible. On x86 this is going to be the > cmpxchg_local on others

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

I think the simplest solution may be to leave slub as done in the patch that we developed last week. The arch must provide a cmpxchg_local that is performance wise the fastest possible. On x86 this is going to be the cmpxchg_local on others where cmpxchg is slower than interrupt disable/enable

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Mon, 27 Aug 2007, Mathieu Desnoyers wrote: > > > * Christoph Lameter ([EMAIL PROTECTED]) wrote: > > > On Mon, 27 Aug 2007, Peter Zijlstra wrote: > > > > > > > So, if the fast path can be done with a preempt off, it might be doable > > > > to suf

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

On Mon, 27 Aug 2007, Mathieu Desnoyers wrote: > * Christoph Lameter ([EMAIL PROTECTED]) wrote: > > On Mon, 27 Aug 2007, Peter Zijlstra wrote: > > > > > So, if the fast path can be done with a preempt off, it might be doable > > > to suffer the slow path with a per cpu lock like that. > > > > Sad

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Mon, 27 Aug 2007, Peter Zijlstra wrote: > > > So, if the fast path can be done with a preempt off, it might be doable > > to suffer the slow path with a per cpu lock like that. > > Sadly the cmpxchg_local requires local per cpu data access. Isnt

Re: [PATCH] SLUB use cmpxchg_local

2007-08-27 Thread Christoph Lameter

On Mon, 27 Aug 2007, Peter Zijlstra wrote: > So, if the fast path can be done with a preempt off, it might be doable > to suffer the slow path with a per cpu lock like that. Sadly the cmpxchg_local requires local per cpu data access. Isnt there some way to make this less expensive on RT? Acessin

Re: [PATCH] SLUB use cmpxchg_local

2007-08-26 Thread Peter Zijlstra

On Tue, 2007-08-21 at 16:14 -0700, Christoph Lameter wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > - Changed smp_rmb() for barrier(). We are not interested in read order > > across cpus, what we want is to be ordered wrt local interrupts only. > > barrier() is much cheaper than

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Christoph Lameter

Ok so we need this. Fix up preempt checks. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Wed, 22 Aug 2007, Mathieu Desnoyers wrote: > > > * Christoph Lameter ([EMAIL PROTECTED]) wrote: > > > void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) > > > @@ -1577,7 +1590,10 @@ static void __slab_free(struct kmem_cach > > > { > >

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Christoph Lameter

On Wed, 22 Aug 2007, Mathieu Desnoyers wrote: > > Then the thread could be preempted and rescheduled on a different cpu > > between put_cpu and local_irq_save() which means that we loose the > > state information of the kmem_cache_cpu structure. > > > > Maybe am I misunderstanding something, bu

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Christoph Lameter

On Wed, 22 Aug 2007, Mathieu Desnoyers wrote: > * Christoph Lameter ([EMAIL PROTECTED]) wrote: > > void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) > > @@ -1577,7 +1590,10 @@ static void __slab_free(struct kmem_cach > > { > > void *prior; > > void **object = (void *)x; > > +

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) > @@ -1577,7 +1590,10 @@ static void __slab_free(struct kmem_cach > { > void *prior; > void **object = (void *)x; > + unsigned long flags; > > + local_irq_save(flags

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Christoph Lameter

Here is the current cmpxchg_local version that I used for testing. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slub_def.h | 10 +++--- mm/slub.c| 74 --- 2 files changed, 56 insertions(+), 28 deletions(-)

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Christoph Lameter

I can confirm Mathieus' measurement now: Athlon64: regular NUMA/discontig 1. Kmalloc: Repeatedly allocate then free test 1 times kmalloc(8) -> 79 cycles kfree -> 92 cycles 1 times kmalloc(16) -> 79 cycles kfree -> 93 cycles 1 times kmalloc(32) -> 88 cycles kfree -> 95 cycles 1 ti

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Mathieu Desnoyers

Measurements on a AMD64 2.0 GHz dual-core In this test, we seem to remove 10 cycles from the kmalloc fast path. On small allocations, it gives a 14% performance increase. kfree fast path also seems to have a 10 cycles improvement. 1. Kmalloc: Repeatedly allocate then free test * cmpxchg_local sl

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Andi Kleen

On Wed, Aug 22, 2007 at 09:45:33AM -0400, Mathieu Desnoyers wrote: > Measurements on a AMD64 2.0 GHz dual-core > > In this test, we seem to remove 10 cycles from the kmalloc fast path. > On small allocations, it gives a 14% performance increase. kfree fast > path also seems to have a 10 cycles imp

Re: [PATCH] SLUB use cmpxchg_local

2007-08-22 Thread Andi Kleen

On Tue, Aug 21, 2007 at 06:06:19PM -0700, Christoph Lameter wrote: > Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz Note the P4 is a extreme case in that "unusual" instructions are quite slow (basically anything that falls out of the trace cache). Core2 tends to be mu

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > As I am going back through the initial cmpxchg_local implementation, it > > seems like it was executing __slab_alloc() with preemption disabled, > > which is wrong. new_slab() is not designed for t

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz > (hyperthreading enabled). Test run with your module show only minor > performance improvements and lots of regressions. So we must have > cmpxchg_local to see any improve

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz (hyperthreading enabled). Test run with your module show only minor performance improvements and lots of regressions. So we must have cmpxchg_local to see any improvements? Some kind of a recent optimization of cmpxchg p

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Andi Kleen ([EMAIL PROTECTED]) wrote: > Mathieu Desnoyers <[EMAIL PROTECTED]> writes: > > > > The measurements I get (in cycles): > > enable interrupts (STI) disable interrupts (CLI) local > > CMPXCHG > > IA32 (P4)11282 26 > >

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > As I am going back through the initial cmpxchg_local implementation, it > seems like it was executing __slab_alloc() with preemption disabled, > which is wrong. new_slab() is not designed for that. The version I send you did not use preemption. We

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Andi Kleen

Mathieu Desnoyers <[EMAIL PROTECTED]> writes: > > The measurements I get (in cycles): > enable interrupts (STI) disable interrupts (CLI) local > CMPXCHG > IA32 (P4)11282 26 > x86_64 AMD64 125 102

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > - Rounding error.. you seem to round at 0.1ms, but I keep the values in > > cycles. The times that you get (1.1ms) seems strangely higher than > > mine, which are under 1000 cycles on a 3GHz sy

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > - Rounding error.. you seem to round at 0.1ms, but I keep the values in > cycles. The times that you get (1.1ms) seems strangely higher than > mine, which are under 1000 cycles on a 3GHz system (less than 333ns). > I guess there is both a ms -

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > Are you running a UP or SMP kernel ? If you run a UP kernel, the > > cmpxchg_local and cmpxchg are identical. > > UP. > > > Oh, and if you run your tests at boot time, the alternatives code may >

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > Are you running a UP or SMP kernel ? If you run a UP kernel, the > cmpxchg_local and cmpxchg are identical. UP. > Oh, and if you run your tests at boot time, the alternatives code may > have removed the lock prefix, therefore making cmpxchg and cmp

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > Using cmpxchg_local vs cmpxchg has a clear impact on the fast paths, as > > shown below: it saves about 60 to 70 cycles for kmalloc and 200 cycles > > for the kmalloc/kfree pair (test 2). > > Hmmm

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote: > * Christoph Lameter ([EMAIL PROTECTED]) wrote: > > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > > > - Changed smp_rmb() for barrier(). We are not interested in read order > > > across cpus, what we want is to be ordered wrt local interrupts

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > Using cmpxchg_local vs cmpxchg has a clear impact on the fast paths, as > shown below: it saves about 60 to 70 cycles for kmalloc and 200 cycles > for the kmalloc/kfree pair (test 2). H.. I wonder if the AMD processors simply do the same in eith

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > kmalloc(8)/kfree = 112 cycles > kmalloc(16)/kfree = 103 cycles > kmalloc(32)/kfree = 103 cycles > kmalloc(64)/kfree = 103 cycles > kmalloc(128)/kfree = 112 cycles > kmalloc(256)/kfree = 111 cycles > kmalloc(512)/kfree = 111 cycles > kmalloc(1024)/kfr

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > SLUB Use cmpxchg() everywhere. > > > > It applies to "SLUB: Single atomic instruction alloc/free using > > cmpxchg". > > > +++ slab/mm/slub.c 2007-08-20 18:42:28.0 -0400 > > @@ -1682,7 +

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > * cmpxchg_local Slub test > kmalloc(8) = 83 cycleskfree = 363 cycles > kmalloc(16) = 85 cycles kfree = 372 cycles > kmalloc(32) = 92 cycles kfree = 377 cycles > kmalloc(64) = 115 cycleskfree = 397 cycles > kmal

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > - Changed smp_rmb() for barrier(). We are not interested in read order > > across cpus, what we want is to be ordered wrt local interrupts only. > > barrier() is much cheaper than a rmb(). > >

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

Reformatting... * Mathieu Desnoyers ([EMAIL PROTECTED]) wrote: > Hi Christoph, > > If you are interested in the raw numbers: > > The (very basic) test module follows. Make sure you change get_cycles() > for get_cycles_sync() if you plan to run this on x86_64. > > (tests taken on a 3GHz Pentium

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > SLUB Use cmpxchg() everywhere. > > It applies to "SLUB: Single atomic instruction alloc/free using > cmpxchg". > +++ slab/mm/slub.c2007-08-20 18:42:28.0 -0400 > @@ -1682,7 +1682,7 @@ redo: > > object[c->offset] = freelist; > >

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > - Changed smp_rmb() for barrier(). We are not interested in read order > across cpus, what we want is to be ordered wrt local interrupts only. > barrier() is much cheaper than a rmb(). But this means a preempt disable is required. RT users do no

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > > > - Fixed an erroneous test in slab_free() (logic was flipped from the > > original code when testing for slow path. It explains the wrong > > numbers you have with big free). > > If you look

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > Therefore, in the test where we have separate passes for slub allocation > and free, we hit mostly the slow path. Any particular reason for that ? Maybe on SMP you are schedule to run on a different processor? Note that I ran my tests at early boot

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > If you are interested in the raw numbers: > > The (very basic) test module follows. Make sure you change get_cycles() > for get_cycles_sync() if you plan to run this on x86_64. Which test is which? Would you be able to format this in a way that we

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter

On Tue, 21 Aug 2007, Mathieu Desnoyers wrote: > - Fixed an erroneous test in slab_free() (logic was flipped from the > original code when testing for slow path. It explains the wrong > numbers you have with big free). If you look at the numbers that I posted earlier then you will see that

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote: > Ok, I played with your patch a bit, and the results are quite > interesting: > ... > Summary: > > (tests repeated 1 times on a 3GHz Pentium 4) > (kernel DEBUG menuconfig options are turned off) > results are in cycles per iteration > I did 2 ru

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

Hi Christoph, If you are interested in the raw numbers: The (very basic) test module follows. Make sure you change get_cycles() for get_cycles_sync() if you plan to run this on x86_64. (tests taken on a 3GHz Pentium 4) * slub HEAD, test 1 [ 99.774699] SLUB Performance testing [ 99.785431]

[PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers

Ok, I played with your patch a bit, and the results are quite interesting: SLUB use cmpxchg_local my changes: - Fixed an erroneous test in slab_free() (logic was flipped from the original code when testing for slow path. It explains the wrong numbers you have with big free). - Use cmpxchg_l

57 matches

Mail list logo