Hi,Eric > As I understand it this patch has two purposes: > 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS > 2. panic as early as possible so exiting threads don't removing > interesting debugging state.
Your understanding is very correct,this is what my patch wants to do > I think if we are going to move the decrement of signal->live that > should be it's own patch and be accompanied with a good description of > why it is safe instead of having the decrement of signal->live be there > as a side effect of another change. I will think about the risks of movement of the decrement of signal->live before exit_signal(). If is difficult to judge movement of the decrement of signal->live is safe,how about only test 'signal->live==1' not use group_dead? Such as: diff --git a/kernel/exit.c b/kernel/exit.c index 04029e3..87f3595 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -767,6 +767,17 @@ void __noreturn do_exit(long code) validate_creds_for_do_exit(tsk); /* + * If global init has exited, + * panic immediately to get a useable coredump. + */ + if (unlikely(is_global_init(tsk) && + ((atomic_read(&tsk->signal->live) == 1) || /*current is last init thread*/ + (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) { + panic("Attempted to kill init! exitcode=0x%08x\n", + tsk->signal->group_exit_code ?: (int)code); + } + + /* * We're taking recursive faults here in do_exit. Safest is to just * leave this task alone and wait for reboot. */ @@ -784,16 +795,9 @@ void __noreturn do_exit(long code) if (tsk->mm) sync_mm_rss(tsk->mm); acct_update_integrals(tsk); + group_dead = atomic_dec_and_test(&tsk->signal->live); if (group_dead) { - /* - * If the last thread of global init has exited, panic - * immediately to get a useable coredump. - */ - if (unlikely(is_global_init(tsk))) - panic("Attempted to kill init! exitcode=0x%08x\n", - tsk->signal->group_exit_code ?: (int)code); - Eric W. Biederman <ebied...@xmission.com> 于2021年3月19日周五 上午3:09写道: > > Oleg Nesterov <o...@redhat.com> writes: > > > On 03/18, qianli zhao wrote: > >> > >> Hi,Oleg > >> > >> Thank you for your reply. > >> > >> >> When init sub-threads running on different CPUs exit at the same time, > >> >> zap_pid_ns_processe()->BUG() may be happened. > >> > >> > and why do you think your patch can't prevent this? > >> > >> > Sorry, I must have missed something. But it seems to me that you are > >> > trying > >> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called > >> > in > >> > the root namespace, and this has nothing to do with CONFIG_PID_NS. > >> > >> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call > >> panic before setting PF_EXITING to prevent zap_pid_ns_processes() > >> being called when init do_exit(). > > > > Ah, I didn't notice your patch does atomic_dec_and_test(signal->live) > > before exit_signals() which sets PF_EXITING. Thanks for correcting me. > > > > So yes, I was wrong, your patch can prevent this. Although I'd like to > > recheck if every do-something-if-group-dead action is correct in the > > case we have a non-PF_EXITING thread... > > > > But then I don't understand the SIGNAL_GROUP_EXIT check added by your > > patch. Do we really need it if we want to avoid zap_pid_ns_processes() > > when the global init exits? > > > >> In addition, the patch also protects the init process state to > >> successfully get usable init coredump. > > > > Could you spell please? > > > > Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want > > to panic earlier, before other init's sub-threads exit? > > That is my understanding. > > As I understand it this patch has two purposes: > 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS > 2. panic as early as possible so exiting threads don't removing > interesting debugging state. > > > It is a bit tricky to tell if the movement of the decrement of > signal->live is safe. That affects current_is_single threaded > which is used by unshare, setns of the time namespace, and setting > the selinux part of creds. > > The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe. > Hmm, Maybe not. Today cgroup_thread_change_begin is held around > setting PF_EXITING before signal->live is decremented. So there seem to > be some subtle cgroup dependencies. > > The usages of group_dead in do_exit seem safe, as except for the new > one everything is the same. > > We could definitely take advantage of knowing group_dead in exit_signals > to simplify it's optimization to not rerouting signals to living > threads. > > > I think if we are going to move the decrement of signal->live that > should be it's own patch and be accompanied with a good description of > why it is safe instead of having the decrement of signal->live be there > as a side effect of another change. > > Eric