On Tue, 2006-07-11 at 07:15, Herbert Xu wrote: > On Tue, Jul 11, 2006 at 03:57:31AM -0700, Andrew Morton wrote: > > > > > >> down_write(&listeners->sem); > > > >> list_for_each_entry_safe(s, tmp, &listeners->list, list) { > > > >> - ret = genlmsg_unicast(skb, s->pid); > > > >> + skb_next = NULL; > > > >> + if (!list_islast(&s->list, &listeners->list)) { > > > >> + skb_next = skb_clone(skb_cur, GFP_KERNEL); > > > > > > > > If we do a GFP_KERNEL allocation with this semaphore held, and the > > > > oom-killer tries to kill something to satisfy the allocation, and the > > > > killed task gets stuck on that semaphore, I wonder of the box locks up. > > > > > > We do GFP_KERNEL inside semaphores/mutexes in lots of places. So if this > > > can deadlock with the oom-killer we probably should fix that, preferably > > > by having GFP_KERNEL fail in that case. > > > > This lock is special, in that it's taken on the exit() path (I think). So > > it can block tasks which are trying to exit. > > Sorry, missed the context. > > If there is a deadlock then it's not just this allocation that you > need worry about. There is also an allocation within genlmsg_uniast > that would be GFP_KERNEL.
Remove down_write() from taskstats code invoked on the exit() path. In send_cpu_listeners(), which is called on the exit path, a down_write() was protecting operations like skb_clone() and genlmsg_unicast() that do GFP_KERNEL allocations. If the oom-killer decides to kill tasks to satisfy the allocations,the exit of those tasks could block on the same semphore. The down_write() was only needed to allow removal of invalid listeners from the listener list. The patch converts the down_write to a down_read and defers the removal to a separate critical region. This ensures that even if the oom-killer is called, no other task's exit is blocked as it can still acquire another down_read. Thanks to Andrew Morton & Herbert Xu for pointing out the oom related pitfalls, and to Chandra Seetharaman for suggesting this fix instead of using something more complex like RCU. Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]> --- kernel/taskstats.c | 24 +++++++++++++++++++----- 1 files changed, 19 insertions(+), 5 deletions(-) Index: linux-2.6.18-rc1/kernel/taskstats.c =================================================================== --- linux-2.6.18-rc1.orig/kernel/taskstats.c 2006-07-11 00:13:00.000000000 -0400 +++ linux-2.6.18-rc1/kernel/taskstats.c 2006-07-12 00:07:53.000000000 -0400 @@ -51,6 +51,7 @@ __read_mostly = { struct listener { struct list_head list; pid_t pid; + char valid; }; struct listener_list { @@ -127,7 +128,7 @@ static int send_cpu_listeners(struct sk_ struct listener *s, *tmp; struct sk_buff *skb_next, *skb_cur = skb; void *reply = genlmsg_data(genlhdr); - int rc, ret; + int rc, ret, delcount = 0; rc = genlmsg_end(skb, reply); if (rc < 0) { @@ -137,7 +138,7 @@ static int send_cpu_listeners(struct sk_ rc = 0; listeners = &per_cpu(listener_array, cpu); - down_write(&listeners->sem); + down_read(&listeners->sem); list_for_each_entry_safe(s, tmp, &listeners->list, list) { skb_next = NULL; if (!list_islast(&s->list, &listeners->list)) { @@ -150,14 +151,26 @@ static int send_cpu_listeners(struct sk_ } ret = genlmsg_unicast(skb_cur, s->pid); if (ret == -ECONNREFUSED) { - list_del(&s->list); - kfree(s); + s->valid = 0; + delcount++; rc = ret; } skb_cur = skb_next; } - up_write(&listeners->sem); + up_read(&listeners->sem); + + if (!delcount) + return rc; + /* Delete invalidated entries */ + down_write(&listeners->sem); + list_for_each_entry_safe(s, tmp, &listeners->list, list) { + if (!s->valid) { + list_del(&s->list); + kfree(s); + } + } + up_write(&listeners->sem); return rc; } @@ -290,6 +303,7 @@ static int add_del_listener(pid_t pid, c goto cleanup; s->pid = pid; INIT_LIST_HEAD(&s->list); + s->valid = 1; listeners = &per_cpu(listener_array, cpu); down_write(&listeners->sem); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html