On 05/01/2017 22:30, Mark Thomas wrote:
> On 05/01/2017 22:01, David Oswell wrote:
>> After some more digging I've been able to further narrow down the problem
>> somewhat, but still not able to pin point the exact cause;
>> The issue is not load related, but rather seems to be related to the timing
>> of the TCP connection being closed.
>> Depending on the timing the poller and exec appear to get into a loop -
>> drilled thsi down to an error status returned in AprEndpoint - which might
>> need to be thrown as an exception rather than return 0 from
>> the fillReadBuffer in AprEndpoint.
>>
>> Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups
>> Poller thread - AprEndpoint:1652 - Poller adds socket to timeout
>> Poller thread - AprEndpoint:1675 - Poller gets rv = 1
>> Poller thread - AprEndpoint:1694 - Poller gets connection (socket
>> Id 565911936 )
>> Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event
>> OPEN_READ
>> Poller thread - AbstractEndpoint:903   - executor executes
>> AprEndpoint$SocketProcessor (id 63)  -> No exception thrown.
>>
>> Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014
>> Interesting comment ?  at AprEndpoint:2445 :
>>             } else if (-result == Status.APR_EGENERAL && isSecure()) {
>>   //Status.APR_EGENERAL=20014
>>                 // Not entirely sure why this is necessary. Testing to date
>> has not
>>                 // identified any issues with this but log it so it can be
>> tracked
>>                 // if it is suspected of causing issues in the future.
>>                 if (log.isDebugEnabled()) {
>>
>> log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(),
>> this));
>>                 }
>>                 return 0;
>> Does this need to throw an exception to get caught higher up as an error?
> 
> Oh great. That code.
> 
> It originates here:
> http://svn.apache.org/viewvc?view=revision&revision=1534619
> 
> For the background see this thread:
> http://tomcat.markmail.org/thread/4vspjutd4kzqkc5q
> 
> As far as I could tell, something was happening in the TLS layer that
> APR/native was reporting as an error that wasn't really an error.
> Therefore, I changed Tomcat to ignore the report of an error and carried on.
> 
> What I suspect is happening is that you are seeing a real error that
> Tomcat now isn't treating as an error.
> 
> I do have a working build environment for tc-native on Windows now so
> this is probably worth a re-visit.
> 
> I'll put this at the top of my TODO list for after the 9.0.x and 8.5.x
> releases I've been meaning to start for the last couple of days.
> 
> Given how far you've got with this in a short time, if you wanted to
> continue digging that would be great. My suggestion for a way forward
> would be:
> - enable debug logging for AprEndpoint
> - try and recreate the 20014 error with the WebSocket drawing board
>   example (as described in the links above)
> - trace back into APR/native to figure out a) what is generating that
>   20014 error code and b) what it should really be generating.
> 
> If you need to build APR/native, the instructions are here:
> https://cwiki.apache.org/confluence/display/TOMCAT/Building+the+Tomcat+Native+Connector+binaries+for+Windows

I can reproduce the (new?) loop error with the following:
- clean trunk (9.0.x) build
- enable debug logging for AprEndpoint
- APR/native 1.2.10
- WebSocket drawboard example
- hold down F5

I see some error messages as per the original problem but fairly quickly
Tomcat enters the infinite loop.

Next steps are digging into the APR/native code.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to