Dear All,

Well, I managed to track this down. As it turned out, the problem was that I 
had a rather short TCP listen queue on the Tomcat connector port (100 elements) 
and that queue was overflowing.

The solution was to 1) set acceptCount to a higher value in Tomcat and 2) to 
configure my OS to allow applications to specify longer accept queues. That 
last step was the one missing. I had changed acceptCount before, but since the 
OS was limiting the accept queue length I did not see any improvement.

More details can be found here:

http://java-monitor.com/forum/showthread.php?t=2492

A big thank you for all that contributed to this thread and helped me 
understand the problem.

Kees Jan


On 22 May 2012, at 14:45, André Warnier wrote:

> Kees Jan Koster wrote:
>> Dear André,
>>> Assuming that your client is really connecting to that HTTP connector on 
>>> port 8080 mentioned above..
>> Yes, it has a forwarded port 80 (using FreeBSD ipfw) that also points to 
>> 8080, and there is an Apache with mod_proxy_http that hooks into 8081. My 
>> tests are on the vanilla port, though.
> 
> Can you be a bit clearer on this part ?  Do you see the problem happening for 
> 1 in 10 posts, when your client connects directly to Tomcat's HTTP port 8080 ?
> Or is it only when the client connects to Tomcat via either one of these 
> intermediate pieces of machinery ?
> 
>>> 1) You are getting a
>>> java.net.SocketException: Connection reset
>>>     at java.net.SocketInputStream.read(SocketInputStream.java:168)
>>> 
>>> so this appears to happen when/while your java client is reading the 
>>> response from the server, and it appears to be that the client is expecting 
>>> to be able to read more data, but finds itself unable to, because the 
>>> socket has been closed "under his nose".
>> The reading is one area I need to look into: did the client get all data, 
>> partial data or none at all. I need to experiment with that.
>>> You say that it happens "frequently", so it's not always.
>> Indeed, not always. About 1 in 10 posts die like this on bad days. Sometimes 
>> hours with no issues. No pattern I can discern.
>>> 2) the server itself seems unaware that there is a problem.  So it has 
>>> already written the whole response back to the client, decided it was done 
>>> with this request, and gone happily to handle other things.
>> Precisely.
>>> That can happen, even if the client has not yet received all data, because 
>>> between the server and the client there is a lot of piping, and the data 
>>> may buffered at various levels or still "in transit".
>>> 
>>> thus..
>>> 
>>> - either the client is misinterpreting the amount of data that it should be 
>>> reading from the server's response (trying to read more than there actually 
>>> is)
>>> (on the other hand, I think that the kind of exception you would get in 
>>> that case would be different, more like "trying to read beyond EOF" or so).
>>> - or something in-between the server and the client closes the connection 
>>> before all data has been returned to the client (and/or is loosing data).
>>> 
>>> It would be helpful to know if this happens when the response is 
>>> particularly large, or small, or if it is unrelated to the response size.
>> The response is a few bytes. I think it is about 10-20 bytes. Less than a 
>> packet, I expect. :)
> 
> That is quite strange, I think.
> See below.
> 
> 
>>> If the server is configured with an AccessLogValve, you should be able to 
>>> see how big the response was, in bytes.  If you have control over the 
>>> client code, you should be able to add something that logs how many bytes 
>>> it has read before the exception occurs.
>> What makes the request size interesting? What previous experience are you 
>> basing this question on?
> 
> Just that intuitively, if a problem happens while reading the response, one 
> would expect that the larger the response is, the more likely that some 
> network issue would show up in the middle.
> 
> But now that I say this, going back to your initial message and the 
> stacktrace in it, I see
> ..
> at sun.net.www.http.HttpClient.parseHTTPHeader
> ...
> so the problem seems to show up right away, while the response's HTTP 
> *headers* are being read.  So it looks like when the problem happens, the 
> client is not able to read anything at all, not even the headers..
> 
> Do all problems show up the same stacktrace, all with a problem while 
> reading/parsing the response headers ?
> 
> 
>>> Dumping the response HTTP headers to the client logfile would also help 
>>> finding out what happens. (If the client is an applet running inside of a 
>>> browser, then a browser add-on would show this easily (like "Live HTTP 
>>> Headers" for Firefox, or Fiddler2 for IE)).
>> I can check that I see the same problems from a browser using firebug, that 
>> is a good idea. Thanks.
>>> Doing a "traceroute" from the client to the server,  may also give an idea 
>>> of what there is actually between the server and the client.
>> mtr reports no packet loss between the two machines I used for testing.
> 
> Actually, I was more thinking about some intermediate problematic proxy or 
> something.
> But a traceroute or similar would not show that.
> 
> See the first question above, about the direct/indirect connection 
> client-server.
> 
>>> And if this all still does not provide any clues, then you're down to a 
>>> network packet trace, using Wireshark or similar.
>> Packet traces I was hoping to avoid. :(
> 
> So far it smells to me like there is some network issue, with some 
> intermediate software or hardware part which is dropping the connection 
> between client and server, after the client has sent the request, but before 
> it even starts receiving the response.
> Is there anything in-between client and server which could have this 
> behaviour, such as when it gets very busy ?
> Do you have any kind of tool which can show you how many requests Tomcat is 
> processing over time, and if these problems happen when it is handling lots 
> of requests ?
> (Not that the problem appears to be at the Tomcat level, but just to check 
> how busy the network may be at such times)
> 
> Another thing : your client is effectively requesting non-keepalive 
> connections, so Tomcat will close the connection after sending the response 
> to each request.  And your clients have to rebuild a new connection for each 
> request.
> If the same client(s) make lots of small requests one after another, this may 
> be counter-productive, because each connection build-up requires several 
> packets going back and forth. Also, on the server side, when a connection is 
> being closed, it will nevertheless "linger" for a while in CLOSE_WAIT state, 
> waiting for the client's TCP stack to acknowledge the CLOSE.  I have seen 
> cases where a large number of such connections being in CLOSE_WAIT triggered 
> bizarre issues, such as a server becoming unable to accept new TCP 
> connections for a while.
> It may be worth checking how many of such CLOSE_WAIT connections you have 
> over time, and if this relates to when the problems happen.
> netstat -pan | grep CLOSE_WAIT
> would show this. If more than a couple of hundreds show up, I'd become 
> suspicious of something like that.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


--
Kees Jan

http://java-monitor.com/
kjkos...@kjkoster.org
+31651838192

The secret of success lies in the stability of the goal. -- Benjamin Disraeli


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to