Hi Vyom, this looks good.
Is there any particular reason why NET_ReadWithTimeout should remain in SocketInputStream.c and not also move to net_util_md.c? That way you could also have a "static long getCurrentTime()" inside net_util_md.c, instead of an exported NET_GetCurrentTime(). And no "#include <sys/time.h>" would be needed in SocketInputStream.c - maybe not even now as it is. Best regards Christoph > -----Original Message----- > From: Vyom Tewari [mailto:vyom.tew...@oracle.com] > Sent: Donnerstag, 8. September 2016 05:20 > To: Langer, Christoph <christoph.lan...@sap.com> > Cc: net-dev@openjdk.java.net > Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even with > soTimeout set > > Hi All, > > Please find the latest > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.5/index.html > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.5/index.html>). > > I fixed the AIX flag issue and move some come common code to > "net_util_md.c" file. > > Thanks, > Vyom > > On 9/8/2016 12:32 PM, Langer, Christoph wrote: > > Hi Vyom, > > > > ok, thanks. I have one addition to my proposal: I don't think there's a > > need for > a global NET_GetCurrentTime function. This one should probably be a static > function inside net_util_md.c (static long getCurrentTime()). > > > > Best > > Christoph > > > >> -----Original Message----- > >> From: Vyom Tewari [mailto:vyom.tew...@oracle.com] > >> Sent: Mittwoch, 7. September 2016 18:31 > >> To: Langer, Christoph <christoph.lan...@sap.com> > >> Cc: net-dev@openjdk.java.net; Chris Hegarty <chris.hega...@oracle.com> > >> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even > with > >> soTimeout set > >> > >> Hi Chris, > >> > >> Sorry I was mostly focusing on our test failure first, i will check > >> your suggestions. > >> > >> Thanks, > >> > >> Vyom > >> > >> > >> On Wednesday 07 September 2016 08:44 PM, Langer, Christoph wrote: > >>> Hi Vyom, > >>> > >>> did you have a look at my suggestions regarding AIX and refactoring? I > can't > >> find none of it in the new webrev nor did you comment on it. > >>> Best regards > >>> Christoph > >>> > >>>> -----Original Message----- > >>>> From: net-dev [mailto:net-dev-boun...@openjdk.java.net] On Behalf Of > >> Vyom > >>>> Tewari > >>>> Sent: Mittwoch, 7. September 2016 17:10 > >>>> To: Chris Hegarty <chris.hega...@oracle.com> > >>>> Cc: net-dev@openjdk.java.net > >>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even > >> with > >>>> soTimeout set > >>>> > >>>> Hi All, > >>>> > >>>> Please find the latest > >>>> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html > >>>> > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>). > >>>> there were some test failures in JPRT and Chris also pointed the same > >>>> issue in modified code. Now with latest code JPRT is clean. > >>>> > >>>> Thanks, > >>>> > >>>> Vyom > >>>> > >>>> > >>>> On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote: > >>>>> On 07/09/16 07:45, Vyom Tewari wrote: > >>>>>> Hi Chris, > >>>>>> > >>>>>> Please find the latest > >>>>>> > >> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html > >> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I > >>>>>> did some code refactoring, JPRT is in progress. > >>>>> In terms of NET_Timeout**, I'm happy with this version, but I > >>>>> have an additional comment. > >>>>> > >>>>> If NET_ReadWithTimeout returns -1 because > >> NET_TimeoutWithCurrentTime > >>>>> returns <= 0, then a JNI pending exception will be set. So the code > >>>>> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1, > >>>>> immediately rather than continuing and possibly trying to set another > >>>>> JNI pending exception. > >>>>> > >>>>> One option is to check the return value from NET_ReadWithTimeout and > >>>>> also (*env)->ExceptionCheck(env). Another is to possibly consolidate > >>>>> the setting of JNI pending exceptions in one place, towards the end > >>>>> of Java_java_net_SocketInputStream_socketRead0, for example EBADF > is > >>>>> handled in two places. > >>>>> > >>>>> -Chris. > >>>>> > >>>>>> I need help from some one to build and test latest changes on AIX, may > >>>>>> be Christoph can help. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Vyom > >>>>>> > >>>>>> > >>>>>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote: > >>>>>>> Vyom, > >>>>>>> > >>>>>>>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tew...@oracle.com> > >> wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Please find the latest > >>>>>>>> > >>>> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html > >>>> > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>). > >>>>>>>> I > >>>>>>>> have incorporated the review comments. > >>>>>>> Your changes completely avoid NET_Timeout, which is quite complex > on > >>>>>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the > >>>>>>> async close mechanism to signal/interrupt a thread blocked in a poll / > >>>>>>> select ). Also, select is used on Mac, which will use poll after your > >>>>>>> changes ( I do remember that there was a bug in poll on Mac around > >>>>>>> the time of the original Mac port ). > >>>>>>> > >>>>>>> Would is be better, rather than replicating the logic in NET_Timeout, > >>>>>>> to have a version that supports passing a function pointer, to run if > >>>>>>> the poll/select returns before the timeout? Just an idea. > >>>>>>> > >>>>>>> -Chris. > >>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> Vyom > >>>>>>>> > >>>>>>>> > >>>>>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote: > >>>>>>>>> On 05/09/16 15:37, Mark Sheppard wrote: > >>>>>>>>>> if the desire is to avoid making double calls on gettimeofday in > >>>>>>>>>> the > >>>>>>>>>> NET_ReadWithTimeout's while loop for its main call flow, > >>>>>>>>> Yes, the desire is to make no more calls to gettimeofday, than are > >>>>>>>>> already done. > >>>>>>>>> > >>>>>>>>>> then consider a modification to NET_Timeout and create a version > >>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct > timeval * > >>>>>>>>>> current_time) > >>>>>>>>>> > >>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct > timeval * > >>>>>>>>>> current_time) { > >>>>>>>>>> int result; > >>>>>>>>>> struct timeval t; > >>>>>>>>>> long prevtime, newtime; > >>>>>>>>>> struct pollfd pfd; > >>>>>>>>>> pfd.fd = s; > >>>>>>>>>> pfd.events = POLLIN; > >>>>>>>>>> > >>>>>>>>>> if (timeout > 0) { > >>>>>>>>>> prevtime = (current_time->tv_sec * 1000) + > >>>>>>>>>> current_time->tv_usec / 1000; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> for(;;) { > >>>>>>>>>> result = poll(&pfd, 1, timeout); > >>>>>>>>>> if (result < 0 && errno == EINTR) { > >>>>>>>>>> if (timeout > 0) { > >>>>>>>>>> gettimeofday(&t, NULL); > >>>>>>>>>> newtime = (t.tv_sec * 1000) + t.tv_usec /1000; > >>>>>>>>>> timeout -= newtime - prevtime; > >>>>>>>>>> if (timeout <= 0) > >>>>>>>>>> return 0; > >>>>>>>>>> prevtime = newtime; > >>>>>>>>>> } > >>>>>>>>>> } else { > >>>>>>>>>> return result; > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> the NET_ReadWithTimeout is modified with. > >>>>>>>>>> > >>>>>>>>>> while (timeout > 0) { > >>>>>>>>>> result = NET_TimeoutWithCurrentTime(fd, timeout, &t); > >>>>>>>>>> > >>>>>>>>>> in any case there is the potential for multiple invocation of > >>>>>>>>>> gettimeofday due to a need to make > >>>>>>>>>> poll restartable, > >>>>>>>>> Yes, and that is fine. Just no more than is already there. > >>>>>>>>> > >>>>>>>>> and adjust the originally specified timeout > >>>>>>>>>> accordingly, and for the edge case > >>>>>>>>>> after the non blocking read. > >>>>>>>>> After an initial skim, your suggestion seems reasonable. > >>>>>>>>> > >>>>>>>>> -Chris. > >>>>>>>>> > >>>>>>>>>> regards > >>>>>>>>>> Mark > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote: > >>>>>>>>>>> Vyom, > >>>>>>>>>>> > >>>>>>>>>>> > >>>> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html > >>>>>>>>>>> > >>>>>>>>>>> There is some concern about the potential performance effect, > etc, > >>>>>>>>>>> of gettimeofday, maybe there is a way out of this. The reuse of > >>>>>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that > >>>>>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse > >>>>>>>>>>> NET_Timeout, but effectively inline its interruptible operation. > >>>>>>>>>>> Or write a variant of NET_Timeout that takes a function to > >>>>>>>>>>> execute. Rather than effectively two loops conditioned on the > >>>>>>>>>>> timeout. Disclaimer: I have not actually tried to do this, but > >>>>>>>>>>> it seems worth considering / evaluating. > >>>>>>>>>>> > >>>>>>>>>>> -Chris. > >>>>>>>>>>> > >>>>>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote: > >>>>>>>>>>>> hi Dimitry, > >>>>>>>>>>>> > >>>>>>>>>>>> thanks for review, I did consider to use a monotonically > >>>>>>>>>>>> increasing > >>>>>>>>>>>> clock like "clock_gettime" but existing nearby > >>>>>>>>>>>> code("NET_Timeout") uses > >>>>>>>>>>>> "gettimeofday" so i choose to be consistent with the existing > >>>>>>>>>>>> code. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> > >>>>>>>>>>>> Vyom > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff > wrote: > >>>>>>>>>>>>> Vyom, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Did you consider to use select() to calculate timeout instead of > >>>>>>>>>>>>> gettimeofday ? > >>>>>>>>>>>>> > >>>>>>>>>>>>> gettimeofday is affected by system time changes, so running > ntpd > >>>>>>>>>>>>> can > >>>>>>>>>>>>> cause unpredictable behavior of this code. Also it's rather > >>>>>>>>>>>>> expensive > >>>>>>>>>>>>> syscall. > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Dmitry > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote: > >>>>>>>>>>>>>> please find the updated > >>>>>>>>>>>>>> > >>>> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>> > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>). > >>>>>>>>>>>>>> I > >>>>>>>>>>>>>> incorporated the review comments. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Vyom > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote: > >>>>>>>>>>>>>>> Hi > >>>>>>>>>>>>>>> perhaps there is an opportunity to do some refactoring > >>>>>>>>>>>>>>> here (... > >>>>>>>>>>>>>>> for me a "goto " carries a code smell! ) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> along the lines > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> if (timeout) { > >>>>>>>>>>>>>>> nread = NET_ReadWithTimeout(...); > >>>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>>> nread = NET_Read(...); > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a > >>>>>>>>>>>>>>> restructuring of > >>>>>>>>>>>>>>> your goto loop > >>>>>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout); > >>>>>>>>>>>>>>> if (nread <= 0) { > >>>>>>>>>>>>>>> if (nread == 0) { > >>>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG > >>>>>>>>>>>>>>> "SocketTimeoutException", > >>>>>>>>>>>>>>> "Read timed out"); > >>>>>>>>>>>>>>> } else if (nread == -1) { > >>>>>>>>>>>>>>> if (errno == EBADF) { > >>>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG > >>>>>>>>>>>>>>> "SocketException", "Socket closed"); > >>>>>>>>>>>>>>> } else if (errno == ENOMEM) { > >>>>>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env, > >>>>>>>>>>>>>>> "NET_Timeout > >>>>>>>>>>>>>>> native heap allocation failed"); > >>>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError > >>>>>>>>>>>>>>> (env, JNU_JAVANETPKG > >>>>>>>>>>>>>>> "SocketException", > >>>>>>>>>>>>>>> "select/poll failed"); > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> // release buffer in main call flow > >>>>>>>>>>>>>>> // if (bufP != BUF) { > >>>>>>>>>>>>>>> // free(bufP); > >>>>>>>>>>>>>>> // } > >>>>>>>>>>>>>>> nread = -1; > >>>>>>>>>>>>>>> break; > >>>>>>>>>>>>>>> } else { > >>>>>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len); > >>>>>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno == > >>>>>>>>>>>>>>> EWOULDBLOCK))) { > >>>>>>>>>>>>>>> gettimeofday(&t, NULL); > >>>>>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000; > >>>>>>>>>>>>>>> _timeout -= newtime - prevtime; > >>>>>>>>>>>>>>> if(_timeout > 0){ > >>>>>>>>>>>>>>> prevtime = newtime; > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> } else { break; } } } return nread; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> e&oe > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> regards > >>>>>>>>>>>>>>> Mark > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote: > >>>>>>>>>>>>>>>> gentle reminder, please review the below code change. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Vyom > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote: > >>>>>>>>>>>>>>>>> Hi All, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Please review the code changes for below issue. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Bug : > >>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> webrev : > >>>>>>>>>>>>>>>>> > >>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html > >>>>>>>>>>>>>>>>> > >>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even > >> with > >>>>>>>>>>>>>>>>> "soTimeout" set.the implementation of > >>>>>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes > >> that > >>>>>>>>>>>>>>>>> read() > >>>>>>>>>>>>>>>>> won't block after poll() reports that a read is possible. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> This assumption does not hold, as noted on the man page > for > >>>>>>>>>>>>>>>>> select > >>>>>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux, > select() > >>>>>>>>>>>>>>>>> may > >>>>>>>>>>>>>>>>> report a socket file descriptor as "ready for reading", > while > >>>>>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for > >> example > >>>>>>>>>>>>>>>>> happen > >>>>>>>>>>>>>>>>> when data has arrived but upon examination has wrong > >>>> checksum > >>>>>>>>>>>>>>>>> and is > >>>>>>>>>>>>>>>>> discarded. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Vyom > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>