On Wed, 9 May 2007, Nick Piggin wrote: > On Wed, May 09, 2007 at 12:41:24AM +0200, Nick Piggin wrote: > > On Wed, May 09, 2007 at 07:30:27AM +1000, Benjamin Herrenschmidt wrote: > > > > > > Waking them all would fix it but at the risk of causing other > > > problems... Maybe PG_waiters need to actually be a counter but if that > > > is the case, then it complicates things even more. > > > > It will wake up 1 exclusive waiter, but no limit on non exclusive waiters. > > Hmm, but it won't wake up waiters behind the exclusive guy... maybe the > > wake up code can check whether the waitqueue is still active after the > > wakeup, and set PG_waiters again in that case? > > Hm, I don't know if we can do that without a race either... > > OTOH, waking all non exclusive waiters may not be a really bad idea.
Not good enough, I'm afraid. It looks like Ben's right and you need a count - and counts in the page struct are a lot harder to add than page flags. I've now played around with the hangs on my three 4CPU machines (all of them in io_schedule below __lock_page, waiting on pages which were neither PG_locked nor PG_waiters when I looked). Seeing Ben's mail, I thought the answer would be just to remove the "_exclusive" from your three prepare_to_wait_exclusive()s. That helped, but it didn't eliminate the hangs. After fiddling around with different ideas for some while, I came to realize that the ClearPageWaiters (in very misleadingly named __unlock_page) is hopeless. It's just so easy for it to clear the PG_waiters that a third task relies upon for wakeup (and which cannot loop around to set it again, because it simply won't be woken by unlock_page/__unlock_page without it already being set). Below is the patch I've applied to see some tests actually running with your patches, but it's just a joke: absurdly racy and presumptuous in itself (the "3" stands for us and the cache and one waiter; I deleted the neighbouring mb and comment, not because I disagree, but because it's ridiculous to pay so much attention to such unlikely races when there's much worse nearby). Though I've not checked: if I've got the counting wrong, then maybe all my pages are left marked PG_waiters by now. (I did imagine we could go back to prepare_to_wait_exclusive once I'd put in the page_count test before ClearPageWaiters; but apparently not, that still hung.) My intention had been to apply the patches to what I tested before with lmbench, to get comparative numbers; but I don't think this is worth the time, it's too far from being a real solution. I was puzzled as to how you came up with any performance numbers yourself, when I could hardly boot. I see you mentioned 2CPU G5, I guess you need a CPU or two more; or maybe it's that you didn't watch what happened as it booted, often those hangs recover later. Hugh --- a/mm/filemap.c 2007-05-08 20:17:31.000000000 +0100 +++ b/mm/filemap.c 2007-05-09 19:14:03.000000000 +0100 @@ -517,13 +517,8 @@ EXPORT_SYMBOL(wait_on_page_bit); */ void fastcall __unlock_page(struct page *page) { - ClearPageWaiters(page); - /* - * The mb is necessary to enforce ordering between the clear_bit and - * the read of the waitqueue (to avoid SMP races with a parallel - * wait_on_page_locked() - */ - smp_mb__after_clear_bit(); + if (page_count(page) <= 3 + page_has_buffers(page)+page_mapcount(page)) + ClearPageWaiters(page); wake_up_page(page, PG_locked); } EXPORT_SYMBOL(__unlock_page); @@ -558,7 +553,7 @@ void fastcall __lock_page(struct page *p DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); do { - prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE); SetPageWaiters(page); if (likely(PageLocked(page))) sync_page(page); @@ -577,7 +572,7 @@ void fastcall __lock_page_nosync(struct DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); do { - prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE); SetPageWaiters(page); if (likely(PageLocked(page))) io_schedule(); @@ -591,7 +586,7 @@ void fastcall __wait_on_page_locked(stru DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); do { - prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE); SetPageWaiters(page); if (likely(PageLocked(page))) sync_page(page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/