When starting child processes from Java, we bootstrap the child process after 
fork and before exec. As part of that process, up to five pipes are handed to 
the child: three for stdin/out/err, respectively, and two internal 
communication pipes (fail and childenv).

If, concurrently with our invocation of `ProcessBuilder.start()`, third-party 
native code forks a child of its own, the natively forked child carries copies 
of these pipes. It then may keep these pipes open. This results in various 
forms of communication errors, most likely hangs - either in 
`ProcessBuilder.start()`, or in customer code. 

In the customer case that started this investigation, `ProcessBuilder.start()` 
hung intermittently when using a third-party Eclipse library that happened to 
perform forks natively.

The JVM has no full control over what happens in its process, since we allow 
native code to run. Therefore, native forks can happen, and we have to work 
around them. 

The fix makes sure that the pipes we use in ProcessBuilder are always tagged 
with CLOEXEC. Since forks are typically followed by execs, this will close any 
file descriptors that were accidentally inherited.

### FORK/VFORK mode

Here, it is sufficient to open all our pipes with O_CLOEXEC.

The caveat here is that only Linux offers an API to do that cleanly: `pipe2(2)` 
([1]). On MacOS and AIX, we don't have `pipe2(2)`, so we need to emulate that 
behavior using `pipe(2)` and `fcntl(2)` in quick succession. That is still 
racy, since we did not completely close the time window within which pipe file 
descriptors are not O_CLOEXEC. But this is the best we can do.

### POSIX_SPAWN mode

Creating the pipes with CLOEXEC alone is not sufficient. With `posix_spawn(3)`, 
we exec twice: first to load the jspawnhelper (inside `posix_spawn(3)`), a 
second time to load the target binary. Pipes created with O_CLOEXEC would not 
survive the first exec.

Therefore, instead of manually `dup2(2)`'ing our file descriptors after the 
first exec in jspawnhelper itself, we set up dup2 file actions to let 
posix_spawn do the dup'ing. According to POSIX, these dup2 file actions will be 
processed before the kernel closes the inherited CLOEXEC file descriptors.

Unfortunately, macOS is again not POSIX-compliant, since the macOS kernel can 
close CLOEXEC file descriptors before posix_spawn processes them in its dup2 
file actions. As a workaround for that bug, we create temporary copies of the 
pipe file descriptors that are untagged with CLOEXEC and use them as sources 
for posix_spawn dup2 actions. Of course, these file descriptor copies would 
again leak to concurrently forked processes. But since these are only temporary 
copies, that is benign.

### Code changes

We replace the manual file descriptor dup'ing in jspawnhelper with posix_spawn 
file actions. That means we can scrap everything associated with that: it is no 
longer necessary to pass file descriptors via command-line arguments to 
jspawnhelper or via the `ChildStuff` structure.

However, I tried to keep code reshuffling minimal and this patch as small as 
possible. While working on this code, I had a number of ideas for streamlining 
and improvements, but I will hold that off for future RFRs. This patch aims to 
be small and easily backportable.

-----------

Testing:

The patch adds three new tests:

1) `ConcNativeForkTest` is a reproducer for the "native concurrent forks cause 
hangs in ProcessBuilder.start()" problem. It is pretty good at provoking hangs 
in unpatched JVMs. However, it is not suited for automated CI/CD environments 
since it essentially fork-bombs itself. Therefore, I marked the test as 
"manual".

2) `PipesCloseOnExecTest` is a modified variant of `ConcNativeForkTest` that, 
instead of executing concurrent native forks, continuously checks that no file 
descriptors pop up untagged with CLOEXEC during a `ProcessBuilder` invocation. 
This test is fast and reliable, but it works only on Linux, since only with 
`pipe2(2)` we can completely close the !CLOEXEC time window. On MacOS and AIX, 
the test correctly spots untagged file descriptors since we still have the 
small time window between `pipe(2)` and `fcntl(2)`.

3) Finally, `FDLeaks` is a complementary test that checks that we don't 
accumulate unclosed file descriptors in ProcessBuilder.start().

- All tests in java/lang/ProcessBuilder were run, and all issues fixed. In 
particular, I tested:
        - Linux x64 with glibc
        - Linux x64 with muslc (Alpine)
        - MacOS Arm64
- Ran jdk + hotspot tier1 on Linux x64
- Tests on AIX are ongoing at SAP
- GHAs ran through successfully

[1] https://linux.die.net/man/2/pipe2

-------------

Commit messages:
 - wip
 - copyrights
 - new test
 - start

Changes: https://git.openjdk.org/jdk/pull/29939/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29939&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8377907
  Stats: 1250 lines in 16 files changed: 1123 ins; 54 del; 73 mod
  Patch: https://git.openjdk.org/jdk/pull/29939.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29939/head:pull/29939

PR: https://git.openjdk.org/jdk/pull/29939

Reply via email to