9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client

2017-04-06 Thread Daniel Fuchs

Hi,

While analyzing some of the later failure traces reported for
https://bugs.openjdk.java.net/browse/JDK-8170940
8170940: test/java/net/httpclient/TimeoutOrdering.java
 failing intermittently
I began to suspect that this was not caused by a test bug.

Looking at the timeout handling code in httpclient,
I have identified a number of race conditions
in the http/2 client timeout logic.
The races can be observed quite easily by supplying a very
small timeout (1ms). In that case the new SmallTimeout test
in the webrev below, which is a modified clone
of TimeOrdering, will always hang.

webrev:
http://cr.openjdk.java.net/~dfuchs/webrev_8170940/webrev.00/

I also suspect these races to be at the root cause of
https://bugs.openjdk.java.net/browse/JDK-8170940
8170940: test/java/net/httpclient/TimeoutOrdering.java
 failing intermittently
so I am planning to close 8170940 as a duplicate of 8178147
when 8178147 is pushed.

I would like to propose this be fixed for JDK 9.

best regards,

-- daniel


Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client

2017-04-06 Thread Pavel Rappo
Heya Daniel,

750 /**
751  * same as above but for errors
752  */
753 void completeResponseExceptionally(Throwable t) {
754 synchronized (response_cfs) {
755 // use index to avoid ConcurrentModificationException
756 // caused by removing the CF from within the loop.
757 for (int i = 0; i < response_cfs.size(); i++) {
758 CompletableFuture cf = response_cfs.get(i);
759 if (!cf.isDone()) {
760 cf.completeExceptionally(t);
761 response_cfs.remove(i);
762 return;
763 }
764 }
765 response_cfs.add(MinimalFuture.failedFuture(t));
766 }
767 }

I was wondering if @762 should be there in the first place. The logic seems
a bit odd: find the first CF that hasn't been yet completed, complete it
and return. Are we guaranteed there are no other not-yet-completed CFs in
this list? Or we simply do not care about it?

Thanks.



Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client

2017-04-06 Thread Daniel Fuchs

Hi Pavel,

Thanks for taking a look!

On 06/04/2017 13:28, Pavel Rappo wrote:

Heya Daniel,

750 /**
751  * same as above but for errors
752  */
753 void completeResponseExceptionally(Throwable t) {
754 synchronized (response_cfs) {
755 // use index to avoid ConcurrentModificationException
756 // caused by removing the CF from within the loop.
757 for (int i = 0; i < response_cfs.size(); i++) {
758 CompletableFuture cf = response_cfs.get(i);
759 if (!cf.isDone()) {
760 cf.completeExceptionally(t);
761 response_cfs.remove(i);
762 return;
763 }
764 }
765 response_cfs.add(MinimalFuture.failedFuture(t));
766 }
767 }

I was wondering if @762 should be there in the first place. The logic seems
a bit odd: find the first CF that hasn't been yet completed, complete it
and return. Are we guaranteed there are no other not-yet-completed CFs in
this list? Or we simply do not care about it?


AFAIU this takes care of requests that can require multiple responses
such has 100-continue. In such a scenario, the responses cannot cross
each others, so completing the first uncompleted CF is actually
correct, because it should be the last CF in the list.
Return is needed here because 1. we are removing something from the
list, so after that the next values of 'i' will be invalid, and 2. we
have found our uncompleted response, which is the one created by
getResponseAsync() - so we don't need (and must not) add a new one
(we must not execute line 765).

I do believe there is room for simplification in the way these race
conditions between getResponseAsync() and completeResponse() are
handled, but I'd rather leave such improvement for 10 (or for the
next update).

The only reason I touched this method is because I noticed the
possibility of ConcurrentModificationException if we reached there.

cheers,

-- daniel



Thanks.





Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client

2017-04-06 Thread Daniel Fuchs

On 06/04/2017 14:08, Daniel Fuchs wrote:

The only reason I touched this method is because I noticed the
possibility of ConcurrentModificationException if we reached there.


Well - now that I think about it again CME would probably not have
happened with the old code either since we broke out of the loop
with return just after the modification. But seeing remove() within
a foreach loop was just a red flag

best,

-- daniel


Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client

2017-04-06 Thread Pavel Rappo

> On 6 Apr 2017, at 14:08, Daniel Fuchs  wrote:
> 
> Return is needed here because 1. we are removing something from the
> list, so after that the next values of 'i' will be invalid, and 2.

Yeah, that would be my second question :-) But you explained the purpose,
so no problem here. Thanks. 

Though it indeed jumps out on you. Could we use an iterator here and rely
on its remove implementation, since we know the implementation of the list?



Re: 9 RFR: 8178147: Race conditions in timeout handling code in http/2 incubator client

2017-04-06 Thread Daniel Fuchs

On 06/04/2017 14:23, Pavel Rappo wrote:

Though it indeed jumps out on you. Could we use an iterator here and rely
on its remove implementation, since we know the implementation of the list?



We could - there's another loop earlier in this file that uses
an index (line 713) - so I just choose to do the same here.
I'm not a great fan of iterators ;-)

best,

-- daniel


RFR [9] 8178161: Default multicast interface on Mac

2017-04-06 Thread Chris Hegarty
Because of some peculiarities on Mac the original Mac port brought
some code that attempts to determine the default network interface
( on Mac only ). In all cases, on recent OS and hardware, it now finds
the Apple peer-to-peer interface, awdl0, which is almost always the
wrong answer. This is fragile code, and using the more modern APIs
that support specifying an interface will avoid most of the problems,
but it is alas still used in many cases. A quick internet search shows
numerous reports of issues around this. We are also running into 
issues in internal testing.

It is possible for the heuristic to do a little better ( but not too much ),
by selecting an interface that supports both IPv4 and IPv6, rather
than just returning the first reasonable one it encounters. This will
better support the common case where a MulticastSocket, bound
to the wildcard address, tries to join a multicast group.

http://cr.openjdk.java.net/~chegar/defaultInterface/

Note: this it not a problem when joining a group using the joinGroup
method that accepts a NetworkInterface ( to join the group on ), or
using the new NIO MulticastChannel interface. The
MulticastSocket.joinGroup(InetAddress) are very much legacy, and
should probably be deprecated at some point in the near future. For
this issue I’d like to keep the changes as conservative as possible,
and possibly revisit this in 10.

-Chris.



Re: RFR [9] 8178161: Default multicast interface on Mac

2017-04-06 Thread Michael McMahon

Looks fine to me Chris. Stylistically, the boolean tests
in line 104 could remove the == true obviously, but not a big deal.

Michael.

On 06/04/2017, 17:15, Chris Hegarty wrote:

Because of some peculiarities on Mac the original Mac port brought
some code that attempts to determine the default network interface
( on Mac only ). In all cases, on recent OS and hardware, it now finds
the Apple peer-to-peer interface, awdl0, which is almost always the
wrong answer. This is fragile code, and using the more modern APIs
that support specifying an interface will avoid most of the problems,
but it is alas still used in many cases. A quick internet search shows
numerous reports of issues around this. We are also running into
issues in internal testing.

It is possible for the heuristic to do a little better ( but not too much ),
by selecting an interface that supports both IPv4 and IPv6, rather
than just returning the first reasonable one it encounters. This will
better support the common case where a MulticastSocket, bound
to the wildcard address, tries to join a multicast group.

http://cr.openjdk.java.net/~chegar/defaultInterface/

Note: this it not a problem when joining a group using the joinGroup
method that accepts a NetworkInterface ( to join the group on ), or
using the new NIO MulticastChannel interface. The
MulticastSocket.joinGroup(InetAddress) are very much legacy, and
should probably be deprecated at some point in the near future. For
this issue I’d like to keep the changes as conservative as possible,
and possibly revisit this in 10.

-Chris.



Re: RFR [9] 8178161: Default multicast interface on Mac

2017-04-06 Thread Chris Hegarty

> On 6 Apr 2017, at 17:50, Michael McMahon  wrote:
> 
> Looks fine to me Chris. Stylistically, the boolean tests
> in line 104 could remove the == true obviously, but not a big deal.

Thanks, I’ll make this change before pushing.

-Chris.

Re: RFR [9] 8178161: Default multicast interface on Mac

2017-04-06 Thread Brian Burkhalter
On Apr 6, 2017, at 11:07 AM, Chris Hegarty  wrote:

>> On 6 Apr 2017, at 17:50, Michael McMahon  
>> wrote:
>> 
>> Looks fine to me Chris. Stylistically, the boolean tests
>> in line 104 could remove the == true obviously, but not a big deal.
> 
> Thanks, I’ll make this change before pushing.

+1

Thanks,

Brian