Tom Lane <[EMAIL PROTECTED]> writes: > > On contented case you'll want task switch anyway, so the futex > > managing should not matter. > > No, we DON'T want a task switch. That's the entire point: in a > multiprocessor, it's a good bet that the spinlock is held by a task > running on another processor, and doing a task switch will take orders > of magnitude longer than just spinning until the lock is released. > You should yield only after spinning long enough to make it a strong > probability that the spinlock is held by a process that's lost the > CPU and needs to be rescheduled.
Does the futex code make any attempt to record the CPU of the process grabbing the lock? Clearly it wouldn't be a guarantee of anything but if it's only used for short-lived spinlocks while acquiring longer lived locks then maybe? > No; that page still says specifically "So a process calling > sched_yield() now must wait until all other runnable processes in the > system have used up their time slices before it will get the processor > again." I can prove that that is NOT what happens, at least not on > a multi-CPU Opteron with current FC4 kernel. However, if the newer > kernels penalize a process calling sched_yield as heavily as this page > claims, then it's not what we want anyway ... Well it would be no worse than select or any other random i/o syscall. It seems to me what you've found is an outright bug in the linux scheduler. Perhaps posting it to linux-kernel would be worthwhile. -- greg ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match