Dear All, Well, I managed to track this down. As it turned out, the problem was that I had a rather short TCP listen queue on the Tomcat connector port (100 elements) and that queue was overflowing.
The solution was to 1) set acceptCount to a higher value in Tomcat and 2) to configure my OS to allow applications to specify longer accept queues. That last step was the one missing. I had changed acceptCount before, but since the OS was limiting the accept queue length I did not see any improvement. More details can be found here: http://java-monitor.com/forum/showthread.php?t=2492 A big thank you for all that contributed to this thread and helped me understand the problem. Kees Jan On 22 May 2012, at 14:45, André Warnier wrote: > Kees Jan Koster wrote: >> Dear André, >>> Assuming that your client is really connecting to that HTTP connector on >>> port 8080 mentioned above.. >> Yes, it has a forwarded port 80 (using FreeBSD ipfw) that also points to >> 8080, and there is an Apache with mod_proxy_http that hooks into 8081. My >> tests are on the vanilla port, though. > > Can you be a bit clearer on this part ? Do you see the problem happening for > 1 in 10 posts, when your client connects directly to Tomcat's HTTP port 8080 ? > Or is it only when the client connects to Tomcat via either one of these > intermediate pieces of machinery ? > >>> 1) You are getting a >>> java.net.SocketException: Connection reset >>> at java.net.SocketInputStream.read(SocketInputStream.java:168) >>> >>> so this appears to happen when/while your java client is reading the >>> response from the server, and it appears to be that the client is expecting >>> to be able to read more data, but finds itself unable to, because the >>> socket has been closed "under his nose". >> The reading is one area I need to look into: did the client get all data, >> partial data or none at all. I need to experiment with that. >>> You say that it happens "frequently", so it's not always. >> Indeed, not always. About 1 in 10 posts die like this on bad days. Sometimes >> hours with no issues. No pattern I can discern. >>> 2) the server itself seems unaware that there is a problem. So it has >>> already written the whole response back to the client, decided it was done >>> with this request, and gone happily to handle other things. >> Precisely. >>> That can happen, even if the client has not yet received all data, because >>> between the server and the client there is a lot of piping, and the data >>> may buffered at various levels or still "in transit". >>> >>> thus.. >>> >>> - either the client is misinterpreting the amount of data that it should be >>> reading from the server's response (trying to read more than there actually >>> is) >>> (on the other hand, I think that the kind of exception you would get in >>> that case would be different, more like "trying to read beyond EOF" or so). >>> - or something in-between the server and the client closes the connection >>> before all data has been returned to the client (and/or is loosing data). >>> >>> It would be helpful to know if this happens when the response is >>> particularly large, or small, or if it is unrelated to the response size. >> The response is a few bytes. I think it is about 10-20 bytes. Less than a >> packet, I expect. :) > > That is quite strange, I think. > See below. > > >>> If the server is configured with an AccessLogValve, you should be able to >>> see how big the response was, in bytes. If you have control over the >>> client code, you should be able to add something that logs how many bytes >>> it has read before the exception occurs. >> What makes the request size interesting? What previous experience are you >> basing this question on? > > Just that intuitively, if a problem happens while reading the response, one > would expect that the larger the response is, the more likely that some > network issue would show up in the middle. > > But now that I say this, going back to your initial message and the > stacktrace in it, I see > .. > at sun.net.www.http.HttpClient.parseHTTPHeader > ... > so the problem seems to show up right away, while the response's HTTP > *headers* are being read. So it looks like when the problem happens, the > client is not able to read anything at all, not even the headers.. > > Do all problems show up the same stacktrace, all with a problem while > reading/parsing the response headers ? > > >>> Dumping the response HTTP headers to the client logfile would also help >>> finding out what happens. (If the client is an applet running inside of a >>> browser, then a browser add-on would show this easily (like "Live HTTP >>> Headers" for Firefox, or Fiddler2 for IE)). >> I can check that I see the same problems from a browser using firebug, that >> is a good idea. Thanks. >>> Doing a "traceroute" from the client to the server, may also give an idea >>> of what there is actually between the server and the client. >> mtr reports no packet loss between the two machines I used for testing. > > Actually, I was more thinking about some intermediate problematic proxy or > something. > But a traceroute or similar would not show that. > > See the first question above, about the direct/indirect connection > client-server. > >>> And if this all still does not provide any clues, then you're down to a >>> network packet trace, using Wireshark or similar. >> Packet traces I was hoping to avoid. :( > > So far it smells to me like there is some network issue, with some > intermediate software or hardware part which is dropping the connection > between client and server, after the client has sent the request, but before > it even starts receiving the response. > Is there anything in-between client and server which could have this > behaviour, such as when it gets very busy ? > Do you have any kind of tool which can show you how many requests Tomcat is > processing over time, and if these problems happen when it is handling lots > of requests ? > (Not that the problem appears to be at the Tomcat level, but just to check > how busy the network may be at such times) > > Another thing : your client is effectively requesting non-keepalive > connections, so Tomcat will close the connection after sending the response > to each request. And your clients have to rebuild a new connection for each > request. > If the same client(s) make lots of small requests one after another, this may > be counter-productive, because each connection build-up requires several > packets going back and forth. Also, on the server side, when a connection is > being closed, it will nevertheless "linger" for a while in CLOSE_WAIT state, > waiting for the client's TCP stack to acknowledge the CLOSE. I have seen > cases where a large number of such connections being in CLOSE_WAIT triggered > bizarre issues, such as a server becoming unable to accept new TCP > connections for a while. > It may be worth checking how many of such CLOSE_WAIT connections you have > over time, and if this relates to when the problems happen. > netstat -pan | grep CLOSE_WAIT > would show this. If more than a couple of hundreds show up, I'd become > suspicious of something like that. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > -- Kees Jan http://java-monitor.com/ kjkos...@kjkoster.org +31651838192 The secret of success lies in the stability of the goal. -- Benjamin Disraeli --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org