On Thu, 26 Feb 2026 15:16:33 GMT, Thomas Stuefe <[email protected]> wrote:

> When starting child processes from Java, we bootstrap the child process after 
> fork and before exec. As part of that process, up to five pipes are handed to 
> the child: three for stdin/out/err, respectively, and two internal 
> communication pipes (fail and childenv).
> 
> If, concurrently with our invocation of `ProcessBuilder.start()`, third-party 
> native code forks a child of its own, the natively forked child carries 
> copies of these pipes. It then may keep these pipes open. This results in 
> various forms of communication errors, most likely hangs - either in 
> `ProcessBuilder.start()`, or in customer code. 
> 
> In the customer case that started this investigation, 
> `ProcessBuilder.start()` hung intermittently when using a third-party Eclipse 
> library that happened to perform forks natively.
> 
> The JVM has no full control over what happens in its process, since we allow 
> native code to run. Therefore, native forks can happen, and we have to work 
> around them. 
> 
> The fix makes sure that the pipes we use in ProcessBuilder are always tagged 
> with CLOEXEC. Since forks are typically followed by execs, this will close 
> any file descriptors that were accidentally inherited.
> 
> ### FORK/VFORK mode
> 
> Here, it is sufficient to open all our pipes with O_CLOEXEC.
> 
> The caveat here is that only Linux offers an API to do that cleanly: 
> `pipe2(2)` ([1]). On MacOS and AIX, we don't have `pipe2(2)`, so we need to 
> emulate that behavior using `pipe(2)` and `fcntl(2)` in quick succession. 
> That is still racy, since we did not completely close the time window within 
> which pipe file descriptors are not O_CLOEXEC. But this is the best we can do.
> 
> ### POSIX_SPAWN mode
> 
> Creating the pipes with CLOEXEC alone is not sufficient. With 
> `posix_spawn(3)`, we exec twice: first to load the jspawnhelper (inside 
> `posix_spawn(3)`), a second time to load the target binary. Pipes created 
> with O_CLOEXEC would not survive the first exec.
> 
> Therefore, instead of manually `dup2(2)`'ing our file descriptors after the 
> first exec in jspawnhelper itself, we set up dup2 file actions to let 
> posix_spawn do the dup'ing. According to POSIX, these dup2 file actions will 
> be processed before the kernel closes the inherited CLOEXEC file descriptors.
> 
> Unfortunately, macOS is again not POSIX-compliant, since the macOS kernel can 
> close CLOEXEC file descriptors before posix_spawn processes them in its dup2 
> file actions. As a workaround for that bug, we create temporary copies of the 
> pipe file descriptors that are untagged with CLOEXEC and use ...

Hi Thomas,

Thanks a lot for finding this issue, describing it in all details and creating 
regression tests for it.

>From a first glance the changes look OK but I'll have to take a closer look 
>next week.

I am just a little concerned about the ever increasing code complexity in this 
area. Have you thought about using Unix domain sockets with `socketpair()` 
instead of pipes for the parent/child communication? That might be simpler and 
more portable, although I haven't really tried it out yet.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29939#issuecomment-3974413937

Reply via email to