On Tue, Jun 9, 2026 at 8:08 AM Florian Weimer <[email protected]> wrote: > > * Jann Horn: > > >> Per the above, the primary win would stem from *NOT* messing with mm. > > > > As you write below, I think we have that with CLONE_MM? The C function > > vfork() is kind of a terrible API because of its returns-twice > > behavior, but I think if process cloning with CLONE_VM|CLONE_VFORK was > > wrapped by libc in a way similar to clone() (with the child executing > > a separate handler function), or if it was used in the implementation > > of some higher-level process-spawning API, it would be a perfectly > > fine API? > > No, there is still a problem with SIGTSTP handling because we cannot > atomically unmask the signal during execve. We need to unblock SIGTSTP > before execve in the new process, but this means that it can get > suspended by SIGTSTP. Consequently, the execve never happens and the > original process is stuck in vfork: > > posix_spawn: parent can get stuck in uninterruptible sleep if child > receives SIGTSTP early enough > > <https://inbox.sourceware.org/libc-help/[email protected]/> > > More on the low-level side, it's difficult to make sure that execve gets > a consistent snapshot of the environ vector. Both vfork and execve need > to be async-signal-safe. Any locking or memory allocation (except for > the stack …) persists in the original process after vfork returns. The
I think that's not entirely accurate; if you call set_robust_list() on a futex list, then call execve(), the futexes should be released once the process switches to a new MM, in begin_new_exec -> exec_mmap -> exec_mm_release -> futex_exec_release -> futex_cleanup -> exit_robust_list. So in theory you could use clone() with CLONE_VM and without CLONE_VFORK, and let the parent either wait for a futex that is released on exec, or somehow asynchronously check later whether the futex is still held... probably not the nicest building block but maybe workable? Though I guess it would fit more nicely if there was a "munmap() this range on exec" API... > environ vector can be large, so making a copy on the stack is not ideal. > It's even harder for getenv/setenv/unsetenv implementations that use > locking instead of software transactional memory. Makes sense, that kind of sounds like a pain inherent in being able to execute from signal handler context...

