Hi, On Wed, Jul 13, 2005 at 07:20:06PM -0700, [EMAIL PROTECTED] wrote: > Hi, > > I found _spin_lock used a LOCK instruction to make the following > operation "decb %0" atomic. As you know, LOCK instruction alone takes > almost 70 clock cycles to finish and this add lots of cost to the > _spin_lock. However _spin_unlock does not use this LOCK instruction and > it uses "movb $1,%0" instead since 4-byte writes on 4-byte aligned > addresses are atomic.
_spin_unlock does not need locked operations because when it is run, the code is already known to be the only one to hold the lock, so it can release it without checking what others do. > So I want rewrite the _spin_lock defined spinlock.h > (/linux/include/asm-i386) as follows to reduce the overhead of _spin_lock > and make it more efficient. It does not work. You cannot write an inter-cpu atomic test-and-set with several unlocked instructions. > #define spin_lock_string \ > "\n1:\t" \ > "cmpb $0,%0\n\t" \ > "jle 2f\n\t" \ ==> here, another thread or CPU can get the lock simultaneously. > "movb $0, %0\n\t" \ > "jmp 3f\n" \ > "2:\t" \ > "rep;nop\n\t" \ > "cmpb $0, %0\n\t" \ > "jle 2b\n\t" \ > "jmp 1b\n" \ > "3:\n\t" > > Compared with the original version as follows, LOCK instruction is > removed. I rebuilt the Intel e1000 Gigabit driver with this _spin_lock. > There is about 2% throughput improvement. > #define spin_lock_string \ > "\n1:\t" \ > "lock ; decb %0\n\t" \ > "jns 3f\n" \ > "2:\t" \ > "rep;nop\n\t" \ > "cmpb $0,%0\n\t" \ > "jle 2b\n\t" \ > "jmp 1b\n" \ > "3:\n\t" > > Do you think I can get a better performance if I dig further? > > Any ideas will be greatly appreciated, well, of course with those methods you can improve performance, but you lose the warranty that you're alone to get a lock, and that's bad. another similar method to get a lock in some very controlled environment is as follows : 1: cmp $0, %0 jne 1b mov $CPUID, %0 membar cmp $CPUID, %0 jne 1b This only works with same speed CPUs and interrupts disabled. But in todays environments, this is very risky (hyperthreaded CPUs, etc...). However, this is often OK for more deterministic CPUs such as microcontrollers. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/