On 05/01/2017 22:30, Mark Thomas wrote: > On 05/01/2017 22:01, David Oswell wrote: >> After some more digging I've been able to further narrow down the problem >> somewhat, but still not able to pin point the exact cause; >> The issue is not load related, but rather seems to be related to the timing >> of the TCP connection being closed. >> Depending on the timing the poller and exec appear to get into a loop - >> drilled thsi down to an error status returned in AprEndpoint - which might >> need to be thrown as an exception rather than return 0 from >> the fillReadBuffer in AprEndpoint. >> >> Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups >> Poller thread - AprEndpoint:1652 - Poller adds socket to timeout >> Poller thread - AprEndpoint:1675 - Poller gets rv = 1 >> Poller thread - AprEndpoint:1694 - Poller gets connection (socket >> Id 565911936 ) >> Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event >> OPEN_READ >> Poller thread - AbstractEndpoint:903 - executor executes >> AprEndpoint$SocketProcessor (id 63) -> No exception thrown. >> >> Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014 >> Interesting comment ? at AprEndpoint:2445 : >> } else if (-result == Status.APR_EGENERAL && isSecure()) { >> //Status.APR_EGENERAL=20014 >> // Not entirely sure why this is necessary. Testing to date >> has not >> // identified any issues with this but log it so it can be >> tracked >> // if it is suspected of causing issues in the future. >> if (log.isDebugEnabled()) { >> >> log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(), >> this)); >> } >> return 0; >> Does this need to throw an exception to get caught higher up as an error? > > Oh great. That code. > > It originates here: > http://svn.apache.org/viewvc?view=revision&revision=1534619 > > For the background see this thread: > http://tomcat.markmail.org/thread/4vspjutd4kzqkc5q > > As far as I could tell, something was happening in the TLS layer that > APR/native was reporting as an error that wasn't really an error. > Therefore, I changed Tomcat to ignore the report of an error and carried on. > > What I suspect is happening is that you are seeing a real error that > Tomcat now isn't treating as an error. > > I do have a working build environment for tc-native on Windows now so > this is probably worth a re-visit. > > I'll put this at the top of my TODO list for after the 9.0.x and 8.5.x > releases I've been meaning to start for the last couple of days. > > Given how far you've got with this in a short time, if you wanted to > continue digging that would be great. My suggestion for a way forward > would be: > - enable debug logging for AprEndpoint > - try and recreate the 20014 error with the WebSocket drawing board > example (as described in the links above) > - trace back into APR/native to figure out a) what is generating that > 20014 error code and b) what it should really be generating. > > If you need to build APR/native, the instructions are here: > https://cwiki.apache.org/confluence/display/TOMCAT/Building+the+Tomcat+Native+Connector+binaries+for+Windows
I can reproduce the (new?) loop error with the following: - clean trunk (9.0.x) build - enable debug logging for AprEndpoint - APR/native 1.2.10 - WebSocket drawboard example - hold down F5 I see some error messages as per the original problem but fairly quickly Tomcat enters the infinite loop. Next steps are digging into the APR/native code. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org