Oleg Nesterov <o...@redhat.com> writes: > On 03/18, qianli zhao wrote: >> >> Hi,Oleg >> >> Thank you for your reply. >> >> >> When init sub-threads running on different CPUs exit at the same time, >> >> zap_pid_ns_processe()->BUG() may be happened. >> >> > and why do you think your patch can't prevent this? >> >> > Sorry, I must have missed something. But it seems to me that you are trying >> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in >> > the root namespace, and this has nothing to do with CONFIG_PID_NS. >> >> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call >> panic before setting PF_EXITING to prevent zap_pid_ns_processes() >> being called when init do_exit(). > > Ah, I didn't notice your patch does atomic_dec_and_test(signal->live) > before exit_signals() which sets PF_EXITING. Thanks for correcting me. > > So yes, I was wrong, your patch can prevent this. Although I'd like to > recheck if every do-something-if-group-dead action is correct in the > case we have a non-PF_EXITING thread... > > But then I don't understand the SIGNAL_GROUP_EXIT check added by your > patch. Do we really need it if we want to avoid zap_pid_ns_processes() > when the global init exits? > >> In addition, the patch also protects the init process state to >> successfully get usable init coredump. > > Could you spell please? > > Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want > to panic earlier, before other init's sub-threads exit?
That is my understanding. As I understand it this patch has two purposes: 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS 2. panic as early as possible so exiting threads don't removing interesting debugging state. It is a bit tricky to tell if the movement of the decrement of signal->live is safe. That affects current_is_single threaded which is used by unshare, setns of the time namespace, and setting the selinux part of creds. The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe. Hmm, Maybe not. Today cgroup_thread_change_begin is held around setting PF_EXITING before signal->live is decremented. So there seem to be some subtle cgroup dependencies. The usages of group_dead in do_exit seem safe, as except for the new one everything is the same. We could definitely take advantage of knowing group_dead in exit_signals to simplify it's optimization to not rerouting signals to living threads. I think if we are going to move the decrement of signal->live that should be it's own patch and be accompanied with a good description of why it is safe instead of having the decrement of signal->live be there as a side effect of another change. Eric