On Wed, 16 Jun 2021 13:53:38 GMT, Daniel Fuchs <dfu...@openjdk.org> wrote:
> Hi, > > Please find below a test-only change to fix some intermittent failures > observed with the httpclient/websocket tests: > these tests intermittently and randomly fail with ENOMEM ("No buffer space > available"). > > Some machines in our CI seem to allow a higher level of concurrency while > being (maybe) configured with lower system resources (such as available > buffer space for the TCP stack). > > Some of the httpclient/websocket tests attempt to fill the sockets buffers in > order to assert some conditions when the buffers are full and writing is > paused. When the test process terminates, this leaves behind TCP sockets in > the TIME_WAIT state that still hold system buffer resources in case > retransmission is needed. When several such tests are run this ends up > causing random "No buffer space available" errors on other tests (including > these tests themselves) running concurrently or shortly after on the same > machine. > > This change implements a few tricks to alleviate the situation: > - configure the tests with smaller send buffers on the client side and > receive buffers on the server side, in order to limit how much buffer space > is consumed by the test. > - when the not-reading server is closed, and before the accepted socket is > closed, read all available data off the socket buffer in order to free up the > buffer space that the test has consumed before closing the socket. > - in some tests that create a large number of HttpClients, limit the number > of clients created in shared client mode, and add a call to System.gc() and a > small pause to give time for gc to collect the old clients which are no > longer referenced. > > With these changes, I have run the HttpClient tests 200 times on the > problematic machines without observing any failures (where previously there > was at least a couple of failures per 50 runs). I also ran tier1 once, and > tier2 twice and the results came clean. > > I am therefore claiming success (even if it might prove temporary ;-) ) > > If these failures come back to haunt the CI again after this fix, a further > remediation policy could be to put the httpclient/websocket directory in > exclusive test execution mode (in TEST.root) - this seems to work too - but > cleaning up garbage in the tests themselves seems preferable. This pull request has now been integrated. Changeset: 8ea0606a Author: Daniel Fuchs <dfu...@openjdk.org> URL: https://git.openjdk.java.net/jdk17/commit/8ea0606aba15911f5bfe2c81a83b42288d97095f Stats: 93 lines in 12 files changed: 86 ins; 0 del; 7 mod 8268714: [macos-aarch64] 7 java/net/httpclient/websocket tests failed Reviewed-by: chegar, michaelm ------------- PR: https://git.openjdk.java.net/jdk17/pull/79