* Jann Horn:

>> Per the above, the primary win would stem from *NOT* messing with mm.
>
> As you write below, I think we have that with CLONE_MM? The C function
> vfork() is kind of a terrible API because of its returns-twice
> behavior, but I think if process cloning with CLONE_VM|CLONE_VFORK was
> wrapped by libc in a way similar to clone() (with the child executing
> a separate handler function), or if it was used in the implementation
> of some higher-level process-spawning API, it would be a perfectly
> fine API?

No, there is still a problem with SIGTSTP handling because we cannot
atomically unmask the signal during execve.  We need to unblock SIGTSTP
before execve in the new process, but this means that it can get
suspended by SIGTSTP.  Consequently, the execve never happens and the
original process is stuck in vfork:

  posix_spawn: parent can get stuck in uninterruptible sleep if child
  receives SIGTSTP early enough
  
<https://inbox.sourceware.org/libc-help/[email protected]/>

More on the low-level side, it's difficult to make sure that execve gets
a consistent snapshot of the environ vector.  Both vfork and execve need
to be async-signal-safe.  Any locking or memory allocation (except for
the stack …) persists in the original process after vfork returns.  The
environ vector can be large, so making a copy on the stack is not ideal.
It's even harder for getenv/setenv/unsetenv implementations that use
locking instead of software transactional memory.

In general, I prefer the vfork+execve API over things like posix_spawn
because eventually, you have dependencies between the syslets, or need
control flow.  This introduces a lot of complexity.  Conceptually,
vfork+execve is much simpler, and in many ways quite safe (even mutexes
work as long as they do not need a correct TID).

Thanks,
Florian


Reply via email to