Kirill Tkhai <ktk...@virtuozzo.com> writes:

> During checkpointing and restore of userspace tasks
> we bumped into the situation, that it's not possible
> to restore the tasks, which user namespace does not
> have uid 0 or gid 0 mapped.
>
> People create user namespace mappings like they want,
> and there is no a limitation on obligatory uid and gid
> "must be mapped". So, if there is no uid 0 or gid 0
> in the mapping, it's impossible to restore mm->exe_file
> of the processes belonging to this user namespace.
>
> Also, there is no a workaround. It's impossible
> to create a temporary uid/gid mapping, because
> only one write to /proc/[pid]/uid_map and gid_map
> is allowed during a namespace lifetime.
> If there is an entry, then no more mapings can't be
> written. If there isn't an entry, we can't write
> there too, otherwise user task won't be able
> to do that in the future.
>
> The patch changes the check, and looks for CAP_SYS_ADMIN
> instead of zero uid and gid. This allows to restore
> a task independently of its user namespace mappings.

Applied thanks.  Testing against 0 in the modern kernel is perhaps the
most bizarre permisssion check I have seen lately.

Eric
>
> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>
> CC: Andrew Morton <a...@linux-foundation.org>
> CC: Serge Hallyn <se...@hallyn.com>
> CC: "Eric W. Biederman" <ebied...@xmission.com>
> CC: Oleg Nesterov <o...@redhat.com>
> CC: Michal Hocko <mho...@suse.com>
> CC: Andrei Vagin <ava...@openvz.org>
> CC: Cyrill Gorcunov <gorcu...@openvz.org>
> CC: Stanislav Kinsburskiy <skinsbur...@virtuozzo.com>
> CC: Pavel Tikhomirov <ptikhomi...@virtuozzo.com>
> ---
>  kernel/sys.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 8a94b4eabcaa..7c6d78148fa0 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1802,15 +1802,11 @@ static int validate_prctl_map(struct prctl_mm_map 
> *prctl_map)
>  
>       /*
>        * Finally, make sure the caller has the rights to
> -      * change /proc/pid/exe link: only local root should
> +      * change /proc/pid/exe link: only local sys admin should
>        * be allowed to.
>        */
>       if (prctl_map->exe_fd != (u32)-1) {
> -             struct user_namespace *ns = current_user_ns();
> -             const struct cred *cred = current_cred();
> -
> -             if (!uid_eq(cred->uid, make_kuid(ns, 0)) ||
> -                 !gid_eq(cred->gid, make_kgid(ns, 0)))
> +             if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>                       goto out;
>       }
>  

Reply via email to