On Thu, Mar 07, 2013 at 10:01:39AM -0800, Tejun Heo wrote:
> Hello, Oleg.
> 
> On Thu, Mar 07, 2013 at 06:25:45PM +0100, Oleg Nesterov wrote:
> > > [  944.011126] Chain exists of:
> > >   &sb->s_type->i_mutex_key#9 --> cgroup_mutex --> &sig->cred_guard_mutex
> > >
> > > [  944.012745]  Possible unsafe locking scenario:
> > >
> > > [  944.013617]        CPU0                    CPU1
> > > [  944.014280]        ----                    ----
> > > [  944.014942]   lock(&sig->cred_guard_mutex);
> > > [  944.021332]                                lock(cgroup_mutex);
> > > [  944.028094]                                
> > > lock(&sig->cred_guard_mutex);
> > > [  944.035007]   lock(&sb->s_type->i_mutex_key#9);
> > > [  944.041602]
> > 
> > And cgroup_mount() does i_mutex -> cgroup_mutex...
> 
> Hmmm...
> 
> > Add cc's. I do not think we can move open_exec() outside of 
> > cred_guard_mutex.
> > We can change do_execve_common(), but binfmt->load_binary() does open() too.
> > 
> > And it is not easy to avoid ->cred_guard_mutex in threadgroup_lock(), we 
> > can't
> > change de_thread() to do threadgroup_change_begin/end...
> > 
> > Or perhaps we can? It doesn't need to sleep under ->group_rwsem, we only
> > need it around ->group_leader changing. Otherwise cgroup_attach_proc()
> > can rely on do_exit()->threadgroup_change_begin() ?
> 
> Using cred_guard_mutex was mostly to avoid adding another locking in
> de_thread() path as it already had one.  We can add group_rwsem
> locking deeper inside and avoid this problem.
> 
> > But perhaps someone can suggest another fix in cgroup.c.
> 
> Another possibility is moving cgroup_lock outside threadgroup_lock(),
> which was impossible before because of cgroup_lock abuses in specific
> controller implementations but most of that have been updated and we
> should now be pretty close to being able to make cgroup_lock outer to
> most other locks.  Appending a completely untested patch below.
> 
> Li, what do you think?

Oops, it was the wrong patch.  Here's the correct one.

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a32f943..e7e5e57 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2193,17 +2193,13 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 
pid, bool threadgroup)
        const struct cred *cred = current_cred(), *tcred;
        int ret;
 
-       if (!cgroup_lock_live_group(cgrp))
-               return -ENODEV;
-
 retry_find_task:
        rcu_read_lock();
        if (pid) {
                tsk = find_task_by_vpid(pid);
                if (!tsk) {
                        rcu_read_unlock();
-                       ret= -ESRCH;
-                       goto out_unlock_cgroup;
+                       return -ESRCH;
                }
                /*
                 * even if we're attaching all tasks in the thread group, we
@@ -2214,8 +2210,7 @@ retry_find_task:
                    !uid_eq(cred->euid, tcred->uid) &&
                    !uid_eq(cred->euid, tcred->suid)) {
                        rcu_read_unlock();
-                       ret = -EACCES;
-                       goto out_unlock_cgroup;
+                       return -EACCES;
                }
        } else
                tsk = current;
@@ -2229,36 +2224,37 @@ retry_find_task:
         * with no rt_runtime allocated.  Just say no.
         */
        if (tsk == kthreadd_task || (tsk->flags & PF_THREAD_BOUND)) {
-               ret = -EINVAL;
                rcu_read_unlock();
-               goto out_unlock_cgroup;
+               return -EINVAL;
        }
 
        get_task_struct(tsk);
        rcu_read_unlock();
 
        threadgroup_lock(tsk);
-       if (threadgroup) {
-               if (!thread_group_leader(tsk)) {
-                       /*
-                        * a race with de_thread from another thread's exec()
-                        * may strip us of our leadership, if this happens,
-                        * there is no choice but to throw this task away and
-                        * try again; this is
-                        * "double-double-toil-and-trouble-check locking".
-                        */
-                       threadgroup_unlock(tsk);
-                       put_task_struct(tsk);
-                       goto retry_find_task;
-               }
-               ret = cgroup_attach_proc(cgrp, tsk);
-       } else
-               ret = cgroup_attach_task(cgrp, tsk);
-       threadgroup_unlock(tsk);
+       if (threadgroup && !thread_group_leader(tsk)) {
+               /*
+                * a race with de_thread from another thread's exec() may
+                * strip us of our leadership, if this happens, there is no
+                * choice but to throw this task away and try again; this
+                * is "double-double-toil-and-trouble-check locking".
+                */
+               threadgroup_unlock(tsk);
+               put_task_struct(tsk);
+               goto retry_find_task;
+       }
 
+       ret = -ENODEV;
+       if (cgroup_lock_live_group(cgrp)) {
+               if (threadgroup)
+                       ret = cgroup_attach_proc(cgrp, tsk);
+               else 
+                       ret = cgroup_attach_task(cgrp, tsk);
+               cgroup_unlock();
+       }
+
+       threadgroup_unlock(tsk);
        put_task_struct(tsk);
-out_unlock_cgroup:
-       cgroup_unlock();
        return ret;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to