Hi Chris,

On 22.05.2009 14:14, Christopher Schultz wrote:
> Rainer,
> 
> On 5/21/2009 12:21 PM, Rainer Jung wrote:
>> 2 remarks about all your stress testing efforts:
> 
>> A) TIME_WAIT
> 
>> When not doing HTTP Keep-Alive, under high load the size of the TCP hash
>> table and the effectiveness of the system to lookp up TCP connections
>> can limit the throughput you can reach. More precisely, depending on the
>> excat way of connection shutdown, you get TIME_WAIT states for the
>> finished connections (without HTTP Keep Alive it could be one such
>> connection per request). Most systems get slow, once the number of those
>> connections reaches somthing arounf 30000.
> 
> That's fine, but the TIME_WAIT connections should be counted against the
> process's file limit, should it? At that point, the process has released
> the connection and the OS is babysitting it through the final stages of
> TCP shutdown.

Those connections will *not* be counted against process file
descriptors. They only exist as an entry in a TCP connection table. They
are no longer associated with the process. It's more of a TCP house
cleaning thing.

> I understand that, with keepalive disabled, performance will kind of
> suck. But, I shouldn't be running out of file descriptors.

Not our of FDs, but if the number of TIME_WAITs gets huge (check via
netstat during the run), your TCP throughput will drop and will be
restricted by the size of the connection hash.

>> E.g. if you are doing 2000 requests per second without HTTP Keep Alive
>> and the combination of web server and stress test tool leads to
>> TIME_WAITs, after 15 seconds your table size might reach a critical size.
> 
> Meaning that the kernel can't keep up, or the NIO connector can't keep
> up? I suspect the latter, because the other tests under the same
> conditions at least complete... the NIO one appears not to have a
> chance. Now, I'm running 6 tests and the NIO test is the 5th one, so
> it's possible that it's just poorly positioned in my test batter. But,
> since I've observed this failure at essentially the same place each
> time, I suspect the NIO connector itself is at fault.

I'm talking about a very general TCP thing. I'm not saying you actually
ran into it, but I'm saying that it makes sense to check the number of
TIME_WAITs via netstat during the test. If it gets very big, than the
TCP implementation will limit your throughput and most likely will
become the first bottleneck you hit. Again: I'm not saying that already
happened, but you should check, whether you run into this while doing
the test.

>> Not using HTTP Keep Alive will very likely limit quickly the achievable
>> throughput when going up in concurrency.
> 
> I'm willing to accept that, but 40 max connections should not be
> resulting in hundreds of left-open file descriptors.

The file descriptos thing is totaly independent. I hijacked the thread :)

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to