Quoting Srivatsa S. Bhat (sriva...@csail.mit.edu): > From: Srivatsa S. Bhat <sriva...@csail.mit.edu> > > The existing patch which disallows unprivileged CLONE_NEWUSER applies > the check for CAP_SYS_ADMIN capability on the 'init_user_ns' > namespace, which is not entirely correct. Consider the following sequence: > > 1. A process with root privileges calls > clone(child_fn, ..., CLONE_NEWUSER, ...) to create a new user namespace. > > 2. child_fn, now running in the newly created user namespace enjoys the > full set of capabilities in the new user namespace, but has lost > its capabilities in the old user namespace (init_user_ns in this > case). > > 3. child_fn now calls > clone(child_fn2, ..., CLONE_NEWUSER, ...) to create a new (nested) > user namespace. > > Step 3 should have succeeded because child_fn has full privileges
No, it should not. If the host has unprivileged_userns_clone=false, then it should require privilege against the init_user_ns to change or bypass that. Yes the userns was *created* by someone with that privilege, but likely they did so precisely so that they could run more sandboxed code. > (including CAP_SYS_ADMIN) in its user namespace, but this step fails > because the CAP_SYS_ADMIN capability is checked against init_user_ns, > as opposed to child_fn's user namespace. > > So fix this by checking for CAP_SYS_ADMIN using ns_capable() on the > current task's user namespace. > > This also helps the userns07 testcase from LTP > (testcases/kernel/containers/userns/userns07.c) to pass when running > with root privileges. If you want to run that test (which checks for >=32 level nesting depth) with privilege, surely you can just set unprivileged_userns_clone=true? > Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> > --- > Applies on debian kernel's master branch. > > ...low-unprivileged-CLONE_NEWUSER-by-default.patch | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git > a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > > b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > index 55edbc7..fc978ea 100644 > --- > a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > +++ > b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > @@ -12,6 +12,9 @@ issues are found, we have a fail-safe. > > Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> > [bwh: Remove unneeded binary sysctl bits] > +[Srivatsa: Fix capability checks when running nested user namespaces > +by using ns_capable() on the current task's user namespace.] > +Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> > --- > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -27,24 +30,24 @@ Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> > > /* > * Minimum number of threads to boot the kernel > -@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_stru > +@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_struct > *copy_process( > if ((clone_flags & (CLONE_NEWUSER|CLONE_FS)) == > (CLONE_NEWUSER|CLONE_FS)) > return ERR_PTR(-EINVAL); > > + if ((clone_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) > -+ if (!capable(CAP_SYS_ADMIN)) > ++ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) > + return ERR_PTR(-EPERM); > + > /* > * Thread groups must share signals as well, and detached threads > * can only be started up within the thread group. > -@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, > +@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) > if (unshare_flags & CLONE_NEWNS) > unshare_flags |= CLONE_FS; > > + if ((unshare_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) { > + err = -EPERM; > -+ if (!capable(CAP_SYS_ADMIN)) > ++ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) > + goto bad_unshare_out; > + } > +