Currently, there's nothing explicitly preventing
cgroup_enable_task_cg_lists() from missing set PF_EXITING and race
against cgroup_exit(), and, depending on the timing, cgroup_exit()
seemingly may finish with the task still linked on css_set leading to
list corruption because cgroup_enable_task_cg_lists() can end up
linking it after list_empty(&tsk->cg_list) test in cgroup_exit().

This can't really happen because exit_mm() grabs and release
task_lock() between setting of PF_EXITING and cgroup_exit(), and
cgroup_enable_task_cg_lists() synchronizes against task_lock too;
however, this is fragile and more of a happy accident.  Let's make the
synchronization explicit by making cgroup_enable_task_cg_lists() grab
siglock around PF_EXITING testing.

This whole on-demand cg_list optimization is extremely fragile and has
ample possibility to lead to bugs which can cause things like
once-a-year oops during boot.  I'm wondering whether the better
approach would be just adding "cgroup_disable=all" handling which
disables the whole cgroup rather than tempting fate with this dynamic
optimization craziness.

v2: Li pointed out that the race condition can't actually happen due
    to task_lock locking in exit_mm().  Updated the patch description
    accordingly and dropped -stable cc.

Signed-off-by: Tejun Heo <t...@kernel.org>
Cc: Li Zefan <lize...@huawei.com>
---
 kernel/cgroup.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1331,9 +1331,13 @@ static void cgroup_enable_task_cg_lists(
                 * We should check if the process is exiting, otherwise
                 * it will race with cgroup_exit() in that the list
                 * entry won't be deleted though the process has exited.
+                * Do it while holding siglock so that we don't end up
+                * racing against cgroup_exit().
                 */
+               spin_lock_irq(&p->sighand->siglock);
                if (!(p->flags & PF_EXITING))
                        list_add(&p->cg_list, &task_css_set(p)->tasks);
+               spin_unlock_irq(&p->sighand->siglock);
 
                task_unlock(p);
        } while_each_thread(g, p);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to