Quoting Srivatsa S. Bhat (sriva...@csail.mit.edu):
> From: Srivatsa S. Bhat <sriva...@csail.mit.edu>
> 
> The existing patch which disallows unprivileged CLONE_NEWUSER applies
> the check for CAP_SYS_ADMIN capability on the 'init_user_ns'
> namespace, which is not entirely correct. Consider the following sequence:
> 
> 1. A process with root privileges calls
>    clone(child_fn, ..., CLONE_NEWUSER, ...) to create a new user namespace.
> 
> 2. child_fn, now running in the newly created user namespace enjoys the
>    full set of capabilities in the new user namespace, but has lost
>    its capabilities in the old user namespace (init_user_ns in this
>    case).
> 
> 3. child_fn now calls
>    clone(child_fn2, ..., CLONE_NEWUSER, ...) to create a new (nested)
>    user namespace.
> 
> Step 3 should have succeeded because child_fn has full privileges

No, it should not.  If the host has unprivileged_userns_clone=false,
then it should require privilege against the init_user_ns to change
or bypass that.  Yes the userns was *created* by someone with that
privilege, but likely they did so precisely so that they could run
more sandboxed code.

> (including CAP_SYS_ADMIN) in its user namespace, but this step fails
> because the CAP_SYS_ADMIN capability is checked against init_user_ns,
> as opposed to child_fn's user namespace.
> 
> So fix this by checking for CAP_SYS_ADMIN using ns_capable() on the
> current task's user namespace.
> 
> This also helps the userns07 testcase from LTP
> (testcases/kernel/containers/userns/userns07.c) to pass when running
> with root privileges.

If you want to run that test (which checks for >=32 level nesting depth)
with privilege, surely you can just set unprivileged_userns_clone=true?

> Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>
> ---
> Applies on debian kernel's master branch.
> 
>  ...low-unprivileged-CLONE_NEWUSER-by-default.patch |   11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git 
> a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
>  
> b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
> index 55edbc7..fc978ea 100644
> --- 
> a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
> +++ 
> b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
> @@ -12,6 +12,9 @@ issues are found, we have a fail-safe.
>  
>  Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com>
>  [bwh: Remove unneeded binary sysctl bits]
> +[Srivatsa: Fix capability checks when running nested user namespaces
> +by using ns_capable() on the current task's user namespace.]
> +Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>
>  ---
>  --- a/kernel/fork.c
>  +++ b/kernel/fork.c
> @@ -27,24 +30,24 @@ Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com>
>   
>   /*
>    * Minimum number of threads to boot the kernel
> -@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_stru
> +@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_struct 
> *copy_process(
>       if ((clone_flags & (CLONE_NEWUSER|CLONE_FS)) == 
> (CLONE_NEWUSER|CLONE_FS))
>               return ERR_PTR(-EINVAL);
>   
>  +    if ((clone_flags & CLONE_NEWUSER) && !unprivileged_userns_clone)
> -+            if (!capable(CAP_SYS_ADMIN))
> ++            if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>  +                    return ERR_PTR(-EPERM);
>  +
>       /*
>        * Thread groups must share signals as well, and detached threads
>        * can only be started up within the thread group.
> -@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long,
> +@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
>       if (unshare_flags & CLONE_NEWNS)
>               unshare_flags |= CLONE_FS;
>   
>  +    if ((unshare_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) {
>  +            err = -EPERM;
> -+            if (!capable(CAP_SYS_ADMIN))
> ++            if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>  +                    goto bad_unshare_out;
>  +    }
>  +

Reply via email to