From: Srivatsa S. Bhat <sriva...@csail.mit.edu> The existing patch which disallows unprivileged CLONE_NEWUSER applies the check for CAP_SYS_ADMIN capability on the 'init_user_ns' namespace, which is not entirely correct. Consider the following sequence:
1. A process with root privileges calls clone(child_fn, ..., CLONE_NEWUSER, ...) to create a new user namespace. 2. child_fn, now running in the newly created user namespace enjoys the full set of capabilities in the new user namespace, but has lost its capabilities in the old user namespace (init_user_ns in this case). 3. child_fn now calls clone(child_fn2, ..., CLONE_NEWUSER, ...) to create a new (nested) user namespace. Step 3 should have succeeded because child_fn has full privileges (including CAP_SYS_ADMIN) in its user namespace, but this step fails because the CAP_SYS_ADMIN capability is checked against init_user_ns, as opposed to child_fn's user namespace. So fix this by checking for CAP_SYS_ADMIN using ns_capable() on the current task's user namespace. This also helps the userns07 testcase from LTP (testcases/kernel/containers/userns/userns07.c) to pass when running with root privileges. Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> --- Applies on debian kernel's master branch. ...low-unprivileged-CLONE_NEWUSER-by-default.patch | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch index 55edbc7..fc978ea 100644 --- a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch +++ b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch @@ -12,6 +12,9 @@ issues are found, we have a fail-safe. Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> [bwh: Remove unneeded binary sysctl bits] +[Srivatsa: Fix capability checks when running nested user namespaces +by using ns_capable() on the current task's user namespace.] +Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> --- --- a/kernel/fork.c +++ b/kernel/fork.c @@ -27,24 +30,24 @@ Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> /* * Minimum number of threads to boot the kernel -@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_stru +@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_struct *copy_process( if ((clone_flags & (CLONE_NEWUSER|CLONE_FS)) == (CLONE_NEWUSER|CLONE_FS)) return ERR_PTR(-EINVAL); + if ((clone_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) -+ if (!capable(CAP_SYS_ADMIN)) ++ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) + return ERR_PTR(-EPERM); + /* * Thread groups must share signals as well, and detached threads * can only be started up within the thread group. -@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, +@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) if (unshare_flags & CLONE_NEWNS) unshare_flags |= CLONE_FS; + if ((unshare_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) { + err = -EPERM; -+ if (!capable(CAP_SYS_ADMIN)) ++ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) + goto bad_unshare_out; + } +