* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote: > I just want to rectify a detail: local_t uses type "long", which is 32 > bits on x86_32 and 64 bits on x86_64. > > Using a cmpxchg8b on i386 seems to require the LOCK prefix to be taken, > so it may degrate performances too much. Therefore, you may prefer to > stay with cli/sti on i386, but using a local cmpxchg would make sense on > x86_64. > > A side-note: I really dislike the new cmpxchg behavior when a too > large value is passed to it. If we pass a uint64_t * as first argument > to cmpxchg or cmpxchg_local on i386, it just fails silently. Before, a > linker error was produced, which required the kernel to be compiled with > -O2 as side-effect, but at least there wasn't any silent failure... > > Mathieu >
Actually, about the i386 case, I am trying to get my head around the locked cmpxchg8b issue. I just implemented what would look like something that could be added to the standard cmpxchg on i386: union pack64 { u64 value; struct { u32 low, high; }; }; static inline u64 cmpxchg8b(u64 * ptr, u64 old, u64 new) { union pack64 oldu; union pack64 newu; union pack64 prevu; oldu.value = old; newu.value = new; __asm__ __volatile__("cmpxchg8b (%6)\n\t" : "=d"(prevu.high), "=a" (prevu.low) : "0" (oldu.high), "1" (oldu.low), "c" (newu.high), "b" (newu.low), "m"(*__xg(ptr)) : "memory"); return prevu.value; } I tried it with and without the LOCK prefix on my Pentium 4. Locked cmpxchg8b : 90 cycles Non locked cmpxchg8b: 30 cycles sti: 166 cycles cli: 159 cycles So, hrm, even if we use the locked version, it is still much faster than the sti/cli. I am thoughtful about the comment in asm-i386/system.h: /* * The semantics of XCHGCMP8B are a bit strange, this is why * there is a loop and the loading of %%eax and %%edx has to * be inside. This inlines well in most cases, the cached * cost is around ~38 cycles. (in the future we might want * to do an SIMD/3DNOW!/MMX/FPU 64-bit store here, but that * might have an implicit FPU-save as a cost, so it's not * clear which path to go.) * * cmpxchg8b must be used with the lock prefix here to allow * the instruction to be executed atomically, see page 3-102 * of the instruction set reference 24319102.pdf. We need * the reader side to see the coherent 64bit value. */ I just re-read 24319102.pdf page 3-102. Here is what seems to be the reference used: "This instruction can be used with a LOCK prefix to allow the instruction to be executed atomi cally. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)" (page 3-102) However, we find _exactly_ the same comment about the cmpxchg instruction: "This instruction can be used with a LOCK prefix to allow the instruction to be executed atomi cally. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)" (page 3-100) Since we use the standard cmpxchg without lock prefix and consider it atomic wrt the local CPU, I wonder why we could not do the same for cmpxchg8b ? Any idea ? It could then give you 8 bytes cmpxchg atomical wrt the local CPU for 30 cycles, which is really not so bad! Or is there any undocumented funkyness about cmpxchg8b I should be aware of ? Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/