On 02/24, Pavel Tikhomirov wrote:
>
> This moves the condition (tid != 1 && !tmp->child_reaper) to after idr
> alloc, so it not only covers that first process in pid namespace has pid
> 1 in case of clone3(set_tid) requesting wrong pid, but also if idr
> itself gives wrong pid for some reason.
>
> This could've been the case before this patch, when creating first
> process the alloc_pid()->pidfs_add_pid() code path fails, so that the
> idr->idr_next is non zero anymore and next process calling to
> alloc_pid(), will get 2 as a pid from idr_alloc_cyclic(). Effectively
> leading to init-less pid namespace, which is a bug.

Yes.

alloc_pid() does:

        /* On failure to allocate the first pid, reset the state */
        if (ns->pid_allocated == PIDNS_ADDING)
                idr_set_cursor(&ns->idr, 0);

but this logic is broken.

Suppose that a task P does sys_unshare(CLONE_NEWPID). Then it does
fork(), and fork() fails for any reason after alloc_pid() succeeds.
If P does another fork() to retry, we have a bug.

So with this patch we can either remove the code above, or (better)
improve this logic.

> Note: This is also a preparation for the next patch in the series, which
> will introduce an ability of creating init from the task different to
> the task which had created the pid namespace. Needed to make sure that
> init is always first, even in this new case.
>
> Suggested-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Pavel Tikhomirov <[email protected]>

Signed-off-by: Oleg Nesterov <[email protected]>


> @@ -296,9 +290,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t 
> *arg_set_tid,
>
>               pid->numbers[i].nr = nr;
>               pid->numbers[i].ns = tmp;
> -             tmp = tmp->parent;
>               i--;
>               retried_preload = false;
> +
> +             /*
> +              * PID 1 (init) must be created first.
> +              */
> +             if (!READ_ONCE(tmp->child_reaper) && nr != 1) {
> +                     retval = -EINVAL;
> +                     goto out_free;
> +             }
> +
> +             tmp = tmp->parent;
>       }

Cosmetic, but why did you move "tmp = tmp->parent;" down? This is fine
but not strictly necessary. OTOH, if you do this, perhaps it makes sense
to move "retried_preload = false;" as well?

Oleg.


Reply via email to