Hi Alan,

Thanks again for testing this change. I dug deep into the issue yesterday and 
got some answers from the Windows Networking team.

The issue is that the flag TCP_INITIAL_RTO_NO_SYN_RETRANSMISSIONS, which we 
passed in to completely eliminate the network delay, isn't defined (or checked) 
on Windows 10 versions prior to RS3 (Redstone 3). The flag was interpreted as 
TCP_INITIAL_RTO_DEFAULT_MAX_SYN_RETRANSMISSIONS, causing a retry of 255 times, 
each one taking 500ms. This made each individual connect delay take 128 seconds 
in total.

I was advised to change the code to perform a runtime check on the exact 
Windows version and unless it's Windows 10RS3 or later, we should set the 
retransmissions to 1. Strangely enough, we can't set it to 0, which is a 
special value interpreted as use the default. With retransmission count of 1, 
we speed up the localhost connects on older versions of Windows by factor of 2.

I have prepared a new webrev with the runtime check for review here:
http://cr.openjdk.java.net/~adityam/nikola/fast_connect_loopback_4/ 

For the Windows version check function I followed the naming standards the SDK 
uses in:
https://docs.microsoft.com/en-us/windows/win32/api/versionhelpers/

If it's not a suitable function name please let me know. They have added this 
helper function for .NET 4.8 but it's not there yet for Win32. Hopefully, it 
comes provided by Microsoft in a future SDK update and we can remove the 
helper. I attempted to use IsWindowsVersionOrGreater, but unfortunately that 
API doesn't allow me to specify the build number to detect RS3.

Thanks,
Nikola

-----Original Message-----
From: Alan Bateman <alan.bate...@oracle.com> 
Sent: July 26, 2020 6:45 AM
To: Nikola Grcevski <nikola.grcev...@microsoft.com>; net-dev@openjdk.java.net
Subject: Re: RFR(s): Improving performance of Windows socket connect on the 
loopback adapter

On 24/07/2020 16:20, Nikola Grcevski wrote:
> Thanks Alan, yes I'll need a sponsor for the patch.
>
>
I tried the patch in our CI and test/jdk/java/net/Socket/Timeouts.java
is consistently failing on Windows Server 2016 systems, specifically
testTimedConnect2 which expects a "connection refused" within 10s of attempting 
to connect to a port on the loopback that doesn't have any service running. 
SIO_TCO_INITIAL_RTO seems to be intended for desktop systems so I'm wondering 
if you can find out if there is any issues with using it on Windows Server 
editions.

-Alan

Reply via email to