On Mon, 18 Jun 2007 10:12:04 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote:
> ----------------------------------------------------> > Subject: [patch] x86: fix spin-loop starvation bug > From: Ingo Molnar <[EMAIL PROTECTED]> > > Miklos Szeredi reported very long pauses (several seconds, sometimes > more) on his T60 (with a Core2Duo) which he managed to track down to > wait_task_inactive()'s open-coded busy-loop. He observed that an > interrupt on one core tries to acquire the runqueue-lock but does not > succeed in doing so for a very long time - while wait_task_inactive() on > the other core loops waiting for the first core to deschedule a task > (which it wont do while spinning in an interrupt handler). > > The problem is: both the spin_lock() code and the wait_task_inactive() > loop uses cpu_relax()/rep_nop(), so in theory the CPU should have > guaranteed MESI-fairness to the two cores - but that didnt happen: one > of the cores was able to monopolize the cacheline that holds the > runqueue lock, for extended periods of time. > > This patch changes the spin-loop to assert an atomic op after every REP > NOP instance - this will cause the CPU to express its "MESI interest" in > that cacheline after every REP NOP. Kiran, if you're still able to reproduce that zone->lru_lock starvation problem, this would be a good one to try... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/