There is a very small, and very old, window for a race in the socket
asynchronous close mechanism on Linux. This can lead to blocking socket
operations never returning even after the socket has been closed.
This issue would appear to exist since its (linux_close.c) creation back
in 1.4, but since the window for the race is tiny, it seems to have gone
unnoticed until now. It was originally diagnosed through code
inspection, but since then I have created and added a small test that
reproduce the issue about one in every 10 - 20 runs, with jdk8, on
Ubuntu 12.04, with 2x 2.33GHz Intel Xeon E5345 (2x quad-core, 1 thread
per core => 8 threads).
closefd first interrupts (sends wakeup signal to) all the threads
blocked on the fd, then it closes/dup2's the fd. However, the signal may
arrive at its target thread before that thread has entered the blocking
system call, and before close/dup2. In this case, the target thread will
simple enter the blocking system call and never return.
Solution
---------
If it was to close/dup2 the fd before issuing the wake up, then any
thread not yet blocked in a system call should see that the fd is closed
on entry, otherwise it will be woken up by the signal.
While there is an equivalent closefd in bsd_close.c ( mac/bsd specific
code), I have not been able to reproduce this issue after many test runs
on mac. Also, making similar changes to closefd in bsd_close runs into a
problem with dup2; dup2 will hang if another thread is doing a blocking
operation. I believe this issue is similar to 7133499. So as far as this
issue is concerned changes will only be make to the Linux version of
closefd.
Webrev
-------
http://cr.openjdk.java.net/~chegar/8006395/webrev.00/webrev/
-Chris.