On Tue, Feb 22, 2005 at 10:34:57PM +0000, Jamie Lokier wrote: > There is one small but important error: the "return ret" mustn't just > return. It must call unqueue_me(&q) just like the code at out_unqueue, > _including_ the conditional "ret = 0", but _excluding_ the up_read().
Not only that, but someone might already have dequeued us, right? It's probably a pathological case though, i.e. someone did a wake on the same (bad) address. How's this patch? It's closer to Linus' pseudo-code than Andrew's, to avoid the extra get_user() at function entry and keep the common case path short. It also includes the feedback from Andrew on the sys_get_mempolicy(), making the patch even simpler there. ---- Some futex functions do get_user calls while holding mmap_sem for reading. If get_user() faults, and another thread happens to be in mmap (or somewhere else holding waiting on down_write for the same semaphore), then do_page_fault will deadlock. Most architectures seem to be exposed to this. To avoid it, make sure the page is available. If not, release the semaphore, fault it in and retry. I also found another exposure by inspection, moving some of the code around avoids the possible deadlock there. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: linux-2.5/kernel/futex.c =================================================================== --- linux-2.5.orig/kernel/futex.c 2005-02-21 16:09:38.000000000 -0600 +++ linux-2.5/kernel/futex.c 2005-02-22 16:38:24.000000000 -0600 @@ -329,6 +329,7 @@ int ret, drop_count = 0; unsigned int nqueued; + retry: down_read(¤t->mm->mmap_sem); ret = get_futex_key(uaddr1, &key1); @@ -355,9 +356,19 @@ before *uaddr1. */ smp_mb(); - if (get_user(curval, (int __user *)uaddr1) != 0) { - ret = -EFAULT; - goto out; + inc_preempt_count(); + ret = get_user(curval, (int __user *)uaddr1); + dec_preempt_count(); + + if (unlikely(ret)) { + up_read(¤t->mm->mmap_sem); + /* Re-do the access outside the lock */ + ret = get_user(curval, (int __user *)uaddr1); + + if (!ret) + goto retry; + + return ret; } if (curval != *valp) { ret = -EAGAIN; @@ -480,6 +491,7 @@ int ret, curval; struct futex_q q; + retry: down_read(¤t->mm->mmap_sem); ret = get_futex_key(uaddr, &q.key); @@ -508,9 +520,21 @@ * We hold the mmap semaphore, so the mapping cannot have changed * since we looked it up in get_futex_key. */ - if (get_user(curval, (int __user *)uaddr) != 0) { - ret = -EFAULT; - goto out_unqueue; + inc_preempt_count(); + ret = get_user(curval, (int __user *)uaddr); + dec_preempt_count(); + if (unlikely(ret)) { + up_read(¤t->mm->mmap_sem); + + if (!unqueue_me(&q)) /* There's a chance we got woken already */ + return 0; + + /* Re-do the access outside the lock */ + ret = get_user(curval, (int __user *)uaddr); + + if (!ret) + goto retry; + return ret; } if (curval != val) { ret = -EWOULDBLOCK; Index: linux-2.5/mm/mempolicy.c =================================================================== --- linux-2.5.orig/mm/mempolicy.c 2005-02-04 00:27:40.000000000 -0600 +++ linux-2.5/mm/mempolicy.c 2005-02-22 14:34:19.000000000 -0600 @@ -524,9 +524,13 @@ } else pval = pol->policy; - err = -EFAULT; + if (vma) { + up_read(¤t->mm->mmap_sem); + vma = NULL; + } + if (policy && put_user(pval, policy)) - goto out; + return -EFAULT; err = 0; if (nmask) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/