On 04:50 am, [email protected] wrote: >I've been prototyping a client that connects to thousands of servers >and >calls some method. It's not real important to me at this stage whether >that's via xmlrpc, perspective broker, or something else. > >What seems to happen on the client machine is that each network >connection >that gets opened and then closed goes into a TIME_WAIT state, and >eventually >there are so many connections in that state that it's impossible to >create >any more.
Yep. That's what happens to a TCP connection when you close it. > >I'm keeping an eye on the output of >netstat -an | wc -l >Initially I've got 569 entries there. When I run my test client, that >ramps >up really quickly and peaks at about 2824. At that point, the client >reports >a callRemoteFailure: Presumably these numbers have something to do with how quickly you're opening and closing new connections. TIME_WAIT lasts for 2MSL (4 minutes) to ensure that a future connection doesn't receive data intended for a previous connection (clearly a bad thing). However... 2824 is a pretty low number at which to run out of sockets. Perhaps you're running this software on Windows? I think Windows has a ridiculously small number of "client sockets" allocated by default. I seem to recall this being something you can change with a registry edit or something like that. Another option would be to switch to a POSIX-platform instead. If you're *not* on Windows, then this is odd and perhaps bears further scrutiny. > >callRemoteFailure [Failure instance: Traceback (failure with no >frames): ><class 'twisted.internet.error.ConnectionLost'>: Connection to the >other >side was lost in a non-clean fashion: Connection lost. This isn't exactly how I'd expect it to fail, but I also don't know what "callRemoteFailure" is or where it comes from, so maybe that's not too surprising. >Increasing the file descriptor limits doesn't seem to have any effect >on >this. Quite so. The process has, after all, already closed these sockets. They no longer count towards the process's file descriptor limit (oh dear, I suppose you're not using Windows if you have a file descriptor limit to raise). > >Is there an established Twisted sanctioned canonical way to free up >this >resource? Or am I doing something wrong? I'm looking into tweaking >SO_REUSEADDR and SO_LINGER - that sound sane? > >Just tapping the lazywebs to see if anyone's already seen this in the >wild. On most reasonably configured Linux machines, you shouldn't run into this problem until you're doing at least an order of magnitude more work. Many times, I have run clients that do many thousands of new connections per second, resulting in tens of thousands of TIME_WAIT sockets on the system with no problem. So, I'm not sure why you're running into this after only a few thousand. Jean-Paul _______________________________________________ Twisted-Python mailing list [email protected] http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
