On 02/25, Oleg Nesterov wrote: > > On 02/24, Pavel Tikhomirov wrote: > > > > This moves the condition (tid != 1 && !tmp->child_reaper) to after idr > > alloc, so it not only covers that first process in pid namespace has pid > > 1 in case of clone3(set_tid) requesting wrong pid, but also if idr > > itself gives wrong pid for some reason. > > > > This could've been the case before this patch, when creating first > > process the alloc_pid()->pidfs_add_pid() code path fails, so that the > > idr->idr_next is non zero anymore and next process calling to > > alloc_pid(), will get 2 as a pid from idr_alloc_cyclic(). Effectively > > leading to init-less pid namespace, which is a bug. > > Yes. > > alloc_pid() does: > > /* On failure to allocate the first pid, reset the state */ > if (ns->pid_allocated == PIDNS_ADDING) > idr_set_cursor(&ns->idr, 0); > > but this logic is broken. > > Suppose that a task P does sys_unshare(CLONE_NEWPID). Then it does > fork(), and fork() fails for any reason after alloc_pid() succeeds. > If P does another fork() to retry, we have a bug. > > So with this patch we can either remove the code above, or (better) > improve this logic.
Damn, no, I was wrong ;) As Andrei pointed out, free_pid() will clear PIDNS_ADDING in this case. Still I think the patch is fine. Oleg.

