On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > On Tue, Feb 21, 2023 at 09:03:27AM +0900, Michael Paquier wrote: > > Perhaps beginning a new thread with a patch and a summary would be > > better at this stage? Another thing I am wondering is if it could be > > possible to test that rather reliably. I have been playing with a few > > scenarios like holding the system() call for a bit with hardcoded > > sleep()s, without much success. I'll try harder on that part.. It's > > been mentioned as well that we could just move away from system() in > > the long-term. > > I'm happy to create a new thread if needed, but I can't tell if there is > any interest in this stopgap/back-branch fix. Perhaps we should just jump > straight to the long-term fix that Thomas is looking into.
Unfortunately the latch-friendly subprocess module proposal I was talking about would be for 17. I may post a thread fairly soon with design ideas + list of problems and decision points as I see them, and hopefully some sketch code, but it won't be a proposal for [/me checks calendar] next week's commitfest and probably wouldn't be appropriate in a final commitfest anyway, and I also have some other existing stuff to clear first. So please do continue with the stopgap ideas. BTW Here's an idea (untested) about how to reproduce the problem. You could copy the source from a system() implementation, call it doomed_system(), and insert kill(-getppid(), SIGQUIT) in between sigprocmask(SIG_SETMASK, &omask, NULL) and exec*(). Parent and self will handle the signal and both reach the proc_exit(). The systems that failed are running code like this: https://github.com/openbsd/src/blob/master/lib/libc/stdlib/system.c https://github.com/DragonFlyBSD/DragonFlyBSD/blob/master/lib/libc/stdlib/system.c I'm pretty sure these other implementations could fail in just the same way (they restore the handler before unblocking, so can run it just before exec() replaces the image): https://github.com/freebsd/freebsd-src/blob/main/lib/libc/stdlib/system.c https://github.com/lattera/glibc/blob/master/sysdeps/posix/system.c The glibc one is a bit busier and, huh, has a lock (I guess maybe deadlockable if proc_exit() also calls system(), but hopefully it doesn't), and uses fork() instead of vfork() but I don't think that's a material difference here (with fork(), parent and child run concurrently, while with vfork() the parent is suspended until the child exists or execs, and then processes its pending signals).