Daniel, The changes in TransportImpl look okay to me. I cannot see how they break anything. On the other hand, I guess it will take some time to see if the (complete) changeset fixes the issue.
One minor thing that makes me a tad bit uncomfortable is the new assert statements. Firstly, they may have visibility side effects. Secondly, they could be more informative. May I suggest we use this? ChannelState s; assert (s = writeState.get()) == CLOSED : s; Or better still, ChannelState s = writeState.get(); assert s == CLOSED : s; -Pavel > On 23 Jul 2020, at 16:02, Daniel Fuchs <daniel.fu...@oracle.com> wrote: > > Hi, > > More testing revealed that some other tests of the same family > kept on failing intermittently, though my changes to > PendingOperation.java should have fixed them. > > So here is a broader fix - which seems to have fixed the issue. > But as a consequence - I am no longer planning to push it to > 15 as it also changes some source files: > > http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.02/ > > best regards, > > -- daniel > > On 21/07/2020 18:53, Daniel Fuchs wrote: >> Hi, >> Please find below a fix for: >> 8249786: java/net/httpclient/websocket/PendingPingTextClose.java >> fails very infrequently >> https://bugs.openjdk.java.net/browse/JDK-8249786 >> webrev: >> http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.00/ >> This test has been observed failing on windows in the JDK 15 CI. >> The most troublesome issue is that the test was producing so >> much output that the actual reason for the failure was lost >> in the output overflow. >> After instrumenting the test to limit the output and >> adding a few higher level traces, I was able to reproduce >> once (out of 250 runs) and see the actual stack trace. >> The test fails just after calling websocket.abort() when it tries >> to verify that the cfPing CompletableFuture completed >> exceptionally, and found that it actually successfully completed. >> The logic of the test is to try to fill up the local send buffer >> by sending ping messages, so that an attempt to write to the >> socket will block. >> For that it creates a server that will accept a websocket >> connection, but will not read anything from the socket input. >> The client side sends ping packets until the socket buffer >> is full - which is detected by setting up a 10s timeout and >> observing that the ping data could not be written during >> this time. The assumption of the test is that a write call >> that takes more than 10s is indicative that the buffers are >> full, and will never succeed. >> The problem occurs when the write succeeds after ~10s either >> because the kernel was busy doing some other things, or because >> the kernel suddenly decided to resize (increase) the buffers, >> which causes the write call to unblock and succeed after 10s. >> The test already had some provision and a workaround for that >> issue - via a repeatable( ) operation - but the workaround >> was only enabled for macOS where such behavior had first been >> observed. >> The fix extends that workaround for Windows - since the later >> failure shows that something similar is happening there. >> The fix also moves the websocket.abort() and following check >> inside the repeatable loop for better reliability. >> Since pushing test fixes during rampdown 2 is permitted, >> and since the failure was observed in the JDK 15 CI, I'm >> planning to push this test fix to the jdk15 repo, >> unless I hear any objection. >> best regards, >> -- daniel >