9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client
Hi, While analyzing some of the later failure traces reported for https://bugs.openjdk.java.net/browse/JDK-8170940 8170940: test/java/net/httpclient/TimeoutOrdering.java failing intermittently I began to suspect that this was not caused by a test bug. Looking at the timeout handling code in httpclient, I have identified a number of race conditions in the http/2 client timeout logic. The races can be observed quite easily by supplying a very small timeout (1ms). In that case the new SmallTimeout test in the webrev below, which is a modified clone of TimeOrdering, will always hang. webrev: http://cr.openjdk.java.net/~dfuchs/webrev_8170940/webrev.00/ I also suspect these races to be at the root cause of https://bugs.openjdk.java.net/browse/JDK-8170940 8170940: test/java/net/httpclient/TimeoutOrdering.java failing intermittently so I am planning to close 8170940 as a duplicate of 8178147 when 8178147 is pushed. I would like to propose this be fixed for JDK 9. best regards, -- daniel
Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client
Heya Daniel, 750 /** 751 * same as above but for errors 752 */ 753 void completeResponseExceptionally(Throwable t) { 754 synchronized (response_cfs) { 755 // use index to avoid ConcurrentModificationException 756 // caused by removing the CF from within the loop. 757 for (int i = 0; i < response_cfs.size(); i++) { 758 CompletableFuture cf = response_cfs.get(i); 759 if (!cf.isDone()) { 760 cf.completeExceptionally(t); 761 response_cfs.remove(i); 762 return; 763 } 764 } 765 response_cfs.add(MinimalFuture.failedFuture(t)); 766 } 767 } I was wondering if @762 should be there in the first place. The logic seems a bit odd: find the first CF that hasn't been yet completed, complete it and return. Are we guaranteed there are no other not-yet-completed CFs in this list? Or we simply do not care about it? Thanks.
Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client
Hi Pavel, Thanks for taking a look! On 06/04/2017 13:28, Pavel Rappo wrote: Heya Daniel, 750 /** 751 * same as above but for errors 752 */ 753 void completeResponseExceptionally(Throwable t) { 754 synchronized (response_cfs) { 755 // use index to avoid ConcurrentModificationException 756 // caused by removing the CF from within the loop. 757 for (int i = 0; i < response_cfs.size(); i++) { 758 CompletableFuture cf = response_cfs.get(i); 759 if (!cf.isDone()) { 760 cf.completeExceptionally(t); 761 response_cfs.remove(i); 762 return; 763 } 764 } 765 response_cfs.add(MinimalFuture.failedFuture(t)); 766 } 767 } I was wondering if @762 should be there in the first place. The logic seems a bit odd: find the first CF that hasn't been yet completed, complete it and return. Are we guaranteed there are no other not-yet-completed CFs in this list? Or we simply do not care about it? AFAIU this takes care of requests that can require multiple responses such has 100-continue. In such a scenario, the responses cannot cross each others, so completing the first uncompleted CF is actually correct, because it should be the last CF in the list. Return is needed here because 1. we are removing something from the list, so after that the next values of 'i' will be invalid, and 2. we have found our uncompleted response, which is the one created by getResponseAsync() - so we don't need (and must not) add a new one (we must not execute line 765). I do believe there is room for simplification in the way these race conditions between getResponseAsync() and completeResponse() are handled, but I'd rather leave such improvement for 10 (or for the next update). The only reason I touched this method is because I noticed the possibility of ConcurrentModificationException if we reached there. cheers, -- daniel Thanks.
Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client
On 06/04/2017 14:08, Daniel Fuchs wrote: The only reason I touched this method is because I noticed the possibility of ConcurrentModificationException if we reached there. Well - now that I think about it again CME would probably not have happened with the old code either since we broke out of the loop with return just after the modification. But seeing remove() within a foreach loop was just a red flag best, -- daniel
Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client
> On 6 Apr 2017, at 14:08, Daniel Fuchs wrote: > > Return is needed here because 1. we are removing something from the > list, so after that the next values of 'i' will be invalid, and 2. Yeah, that would be my second question :-) But you explained the purpose, so no problem here. Thanks. Though it indeed jumps out on you. Could we use an iterator here and rely on its remove implementation, since we know the implementation of the list?
Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client
On 06/04/2017 14:23, Pavel Rappo wrote: Though it indeed jumps out on you. Could we use an iterator here and rely on its remove implementation, since we know the implementation of the list? We could - there's another loop earlier in this file that uses an index (line 713) - so I just choose to do the same here. I'm not a great fan of iterators ;-) best, -- daniel
RFR [9] 8178161: Default multicast interface on Mac
Because of some peculiarities on Mac the original Mac port brought some code that attempts to determine the default network interface ( on Mac only ). In all cases, on recent OS and hardware, it now finds the Apple peer-to-peer interface, awdl0, which is almost always the wrong answer. This is fragile code, and using the more modern APIs that support specifying an interface will avoid most of the problems, but it is alas still used in many cases. A quick internet search shows numerous reports of issues around this. We are also running into issues in internal testing. It is possible for the heuristic to do a little better ( but not too much ), by selecting an interface that supports both IPv4 and IPv6, rather than just returning the first reasonable one it encounters. This will better support the common case where a MulticastSocket, bound to the wildcard address, tries to join a multicast group. http://cr.openjdk.java.net/~chegar/defaultInterface/ Note: this it not a problem when joining a group using the joinGroup method that accepts a NetworkInterface ( to join the group on ), or using the new NIO MulticastChannel interface. The MulticastSocket.joinGroup(InetAddress) are very much legacy, and should probably be deprecated at some point in the near future. For this issue I’d like to keep the changes as conservative as possible, and possibly revisit this in 10. -Chris.
Re: RFR [9] 8178161: Default multicast interface on Mac
Looks fine to me Chris. Stylistically, the boolean tests in line 104 could remove the == true obviously, but not a big deal. Michael. On 06/04/2017, 17:15, Chris Hegarty wrote: Because of some peculiarities on Mac the original Mac port brought some code that attempts to determine the default network interface ( on Mac only ). In all cases, on recent OS and hardware, it now finds the Apple peer-to-peer interface, awdl0, which is almost always the wrong answer. This is fragile code, and using the more modern APIs that support specifying an interface will avoid most of the problems, but it is alas still used in many cases. A quick internet search shows numerous reports of issues around this. We are also running into issues in internal testing. It is possible for the heuristic to do a little better ( but not too much ), by selecting an interface that supports both IPv4 and IPv6, rather than just returning the first reasonable one it encounters. This will better support the common case where a MulticastSocket, bound to the wildcard address, tries to join a multicast group. http://cr.openjdk.java.net/~chegar/defaultInterface/ Note: this it not a problem when joining a group using the joinGroup method that accepts a NetworkInterface ( to join the group on ), or using the new NIO MulticastChannel interface. The MulticastSocket.joinGroup(InetAddress) are very much legacy, and should probably be deprecated at some point in the near future. For this issue I’d like to keep the changes as conservative as possible, and possibly revisit this in 10. -Chris.
Re: RFR [9] 8178161: Default multicast interface on Mac
> On 6 Apr 2017, at 17:50, Michael McMahon wrote: > > Looks fine to me Chris. Stylistically, the boolean tests > in line 104 could remove the == true obviously, but not a big deal. Thanks, I’ll make this change before pushing. -Chris.
Re: RFR [9] 8178161: Default multicast interface on Mac
On Apr 6, 2017, at 11:07 AM, Chris Hegarty wrote: >> On 6 Apr 2017, at 17:50, Michael McMahon >> wrote: >> >> Looks fine to me Chris. Stylistically, the boolean tests >> in line 104 could remove the == true obviously, but not a big deal. > > Thanks, I’ll make this change before pushing. +1 Thanks, Brian