You are completely right: on x86_64, a bit got lost in the move to cmpxchg.h, here is the fix. It applies on 2.6.22-rc6-mm1.
x86_64 - Use non locked version for local_cmpxchg() local_cmpxchg() should not use any LOCK prefix. This change probably got lost in the move to cmpxchg.h. Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]> --- include/asm-x86_64/cmpxchg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6-lttng/include/asm-x86_64/cmpxchg.h =================================================================== --- linux-2.6-lttng.orig/include/asm-x86_64/cmpxchg.h 2007-07-10 01:10:10.000000000 -0400 +++ linux-2.6-lttng/include/asm-x86_64/cmpxchg.h 2007-07-10 01:11:03.000000000 -0400 @@ -128,7 +128,7 @@ ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\ (unsigned long)(n),sizeof(*(ptr)))) #define cmpxchg_local(ptr,o,n)\ - ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\ + ((__typeof__(*(ptr)))__cmpxchg_local((ptr),(unsigned long)(o),\ (unsigned long)(n),sizeof(*(ptr)))) #endif * Christoph Lameter ([EMAIL PROTECTED]) wrote: > On Mon, 9 Jul 2007, Mathieu Desnoyers wrote: > > > > > Yep, I volountarily used the variant without lock prefix because the > > > > data is per cpu and I disable preemption. > > > > > > local_cmpxchg generates this? > > > > > > > Yes. > > Does not work here. If I use > > static void __always_inline *slab_alloc(struct kmem_cache *s, > gfp_t gfpflags, int node, void *addr) > { > void **object; > struct kmem_cache_cpu *c; > > preempt_disable(); > c = get_cpu_slab(s, smp_processor_id()); > redo: > object = c->freelist; > if (unlikely(!object || !node_match(c, node))) > return __slab_alloc(s, gfpflags, node, addr, c); > > if (cmpxchg_local(&c->freelist, object, object[c->offset]) != object) > goto redo; > > preempt_enable(); > if (unlikely((gfpflags & __GFP_ZERO))) > memset(object, 0, c->objsize); > > return object; > } > > Then the code will include a lock prefix: > > 3270: 48 8b 1a mov (%rdx),%rbx > 3273: 48 85 db test %rbx,%rbx > 3276: 74 23 je 329b <kmem_cache_alloc+0x4b> > 3278: 8b 42 14 mov 0x14(%rdx),%eax > 327b: 4c 8b 0c c3 mov (%rbx,%rax,8),%r9 > 327f: 48 89 d8 mov %rbx,%rax > 3282: f0 4c 0f b1 0a lock cmpxchg %r9,(%rdx) > 3287: 48 39 c3 cmp %rax,%rbx > 328a: 75 e4 jne 3270 <kmem_cache_alloc+0x20> > 328c: 66 85 f6 test %si,%si > 328f: 78 19 js 32aa <kmem_cache_alloc+0x5a> > 3291: 48 89 d8 mov %rbx,%rax > 3294: 48 83 c4 08 add $0x8,%rsp > 3298: 5b pop %rbx > 3299: c9 leaveq > 329a: c3 retq > > > > What applies to local_inc, given as example in the local_ops.txt > > document, applies integrally to local_cmpxchg. And I would say that > > local_cmpxchg is by far the cheapest locking mechanism I have found, and > > use today, for my kernel tracer. The idea emerged from my need to trace > > every execution context, including NMIs, while still providing good > > performances. local_cmpxchg was the perfect fit; that's why I deployed > > it in local.h in each and every architecture. > > Great idea. The SLUB allocator may be able to use your idea to improve > both the alloc and free path. > -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/