On 2/25/26 10:06, Oleg Nesterov wrote:
> On 02/24, Andrei Vagin wrote:
>>
>> On Tue, Feb 24, 2026 at 9:02 AM Pavel Tikhomirov
>> <[email protected]> wrote:
>>>
>>> This moves the condition (tid != 1 && !tmp->child_reaper) to after idr
>>> alloc, so it not only covers that first process in pid namespace has pid
>>> 1 in case of clone3(set_tid) requesting wrong pid, but also if idr
>>> itself gives wrong pid for some reason.
>>>
>>> This could've been the case before this patch, when creating first
>>> process the alloc_pid()->pidfs_add_pid() code path fails, so that the
>>> idr->idr_next is non zero anymore and next process calling to
>>> alloc_pid(), will get 2 as a pid from idr_alloc_cyclic(). Effectively
>>> leading to init-less pid namespace, which is a bug.
>>
>> I don't think this is an issue right now, as we only allow one attempt
>> to create the init. If that first attempt fails, free_pid will disable
>> further PID allocations in the target namespace.
>
> Thanks, you are right!
I guess it means that we should change the commit message a bit, e.g.: replace
"Effectively leading ..." to "Though thanks to PIDNS_ADDING logic, free_pid()
disables further pid allocation in this case and it does not lead to any real
problem.".
Please advise if I should resend the whole series (v4) for that?
>
>> This isn't directly related to this patch, but why do we check
>> "ns->pid_allocated & PIDNS_ADDING" after allocating all pids? Wouldn't
>> it be more reasonable to do that right after taking the pidmap_lock?
>
> Perhaps, but alloc_pid() can drop pidmap_lock and retry, so we will
> need to re-check PIDNS_ADDING.
Maybe it was this way to consolidate the (->pid_allocated & PIDNS_ADDING)
check and the (->pid_allocated++) change in one place for readability.
Code reads like: we don't want the increment if the flag is unset.
>
> Oleg.
>
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.