On 02/24, Pavel Tikhomirov wrote:
>
> This moves the condition (tid != 1 && !tmp->child_reaper) to after idr
> alloc, so it not only covers that first process in pid namespace has pid
> 1 in case of clone3(set_tid) requesting wrong pid, but also if idr
> itself gives wrong pid for some reason.
>
> This could've been the case before this patch, when creating first
> process the alloc_pid()->pidfs_add_pid() code path fails, so that the
> idr->idr_next is non zero anymore and next process calling to
> alloc_pid(), will get 2 as a pid from idr_alloc_cyclic(). Effectively
> leading to init-less pid namespace, which is a bug.
Yes.
alloc_pid() does:
/* On failure to allocate the first pid, reset the state */
if (ns->pid_allocated == PIDNS_ADDING)
idr_set_cursor(&ns->idr, 0);
but this logic is broken.
Suppose that a task P does sys_unshare(CLONE_NEWPID). Then it does
fork(), and fork() fails for any reason after alloc_pid() succeeds.
If P does another fork() to retry, we have a bug.
So with this patch we can either remove the code above, or (better)
improve this logic.
> Note: This is also a preparation for the next patch in the series, which
> will introduce an ability of creating init from the task different to
> the task which had created the pid namespace. Needed to make sure that
> init is always first, even in this new case.
>
> Suggested-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Pavel Tikhomirov <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
> @@ -296,9 +290,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t
> *arg_set_tid,
>
> pid->numbers[i].nr = nr;
> pid->numbers[i].ns = tmp;
> - tmp = tmp->parent;
> i--;
> retried_preload = false;
> +
> + /*
> + * PID 1 (init) must be created first.
> + */
> + if (!READ_ONCE(tmp->child_reaper) && nr != 1) {
> + retval = -EINVAL;
> + goto out_free;
> + }
> +
> + tmp = tmp->parent;
> }
Cosmetic, but why did you move "tmp = tmp->parent;" down? This is fine
but not strictly necessary. OTOH, if you do this, perhaps it makes sense
to move "retried_preload = false;" as well?
Oleg.