There is a very small, and very old, window for a race in the socket asynchronous close mechanism on Linux. This can lead to blocking socket operations never returning even after the socket has been closed.

This issue would appear to exist since its (linux_close.c) creation back in 1.4, but since the window for the race is tiny, it seems to have gone unnoticed until now. It was originally diagnosed through code inspection, but since then I have created and added a small test that reproduce the issue about one in every 10 - 20 runs, with jdk8, on Ubuntu 12.04, with 2x 2.33GHz Intel Xeon E5345 (2x quad-core, 1 thread per core => 8 threads).

closefd first interrupts (sends wakeup signal to) all the threads blocked on the fd, then it closes/dup2's the fd. However, the signal may arrive at its target thread before that thread has entered the blocking system call, and before close/dup2. In this case, the target thread will simple enter the blocking system call and never return.

Solution
---------
If it was to close/dup2 the fd before issuing the wake up, then any thread not yet blocked in a system call should see that the fd is closed on entry, otherwise it will be woken up by the signal.

While there is an equivalent closefd in bsd_close.c ( mac/bsd specific code), I have not been able to reproduce this issue after many test runs on mac. Also, making similar changes to closefd in bsd_close runs into a problem with dup2; dup2 will hang if another thread is doing a blocking operation. I believe this issue is similar to 7133499. So as far as this issue is concerned changes will only be make to the Linux version of closefd.

Webrev
-------

http://cr.openjdk.java.net/~chegar/8006395/webrev.00/webrev/

-Chris.

Reply via email to