In desperation of not finding the real memory leak on the production server,
I wrote a test server that I can push to arbitrary high RSS memory. I am far from sure if this the same leak that I observe in production, but I would like to understand what this one is. This is the server code: Server.py: import twisted.protocols.basic from twisted.internet.protocol import Factory from twisted.internet import reactor class HiRate(twisted.protocols.basic.LineOnlyReceiver): MAX_LENGTH = 20000 def lineReceived(self, line): if line == 'get': out = 'a'*4000+'\r\r' self.transport.write(out) factory = Factory() factory.protocol = HiRate reactor.listenTCP(8007, factory, backlog=50, interface='10.18.0.2') reactor.run() This server has to be flooded by "get" requests from this client: Client.py: import socket, time HOST='10.18.0.2' PORT=8007 def client(): """high rate client, needs a dedicated CPU to run""" s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) try: s.connect((HOST,PORT)) except socket.error, e: print 'client error is %s' %e return n=0 while 1: #print "iter %s" %n #time.sleep(.001) s.send('get\r\n') s.setblocking(0) try: r = s.recv(1024) except: r=0 while r: #print r try: r=s.recv(1024) except socket.error, e: r=0 s.setblocking(1) n+=1 client() To reproduce the memory leak, I either need two machines with fast LAN between them (since the client program takes 100% CPU), or possibly one machine with a dual core CPU (I have not tried that). It is important that client.py is given a separate CPU to run. When the length of the response from the server is sufficient, (out = 'a'*4000+'\r\r' , 4000 is enough in my case), the RSS of the server process starts to leak without a bound. If you introduce a small delay in the client (#time.sleep(.001)), the leak does not occur. Looking at tcpdump on the server machine, I sometimes see many "get" packets from the client in a row, that are not followed by response packets from the server with payload 'aaaaa...'. Only when the server is in this "overwhelmed" state, the memory seems to grow unbounded. I first thought it may be an issue of the unbounded send queue on the server, but the examination of Send-Q with netstat shows that Send-Q saturates to a certain ceiling value, while the RSS memory of the server process continues to grow. Here are some commands I was using to watch the parameters of the server: Watch send-Q and recv-Q: root$ watch -n1 netstat -an RSS memory of the server: root$ watch -n1 ps -orss -p`netstat -nlp | grep :8007 | awk '{print $7}' | cut -d/ -f1` Traffic to/from the server: root$ tcpdump -A -s10024 -nn -i eth1 'port 8007' (in my case I use eth1 for LAN to the client) > -----Original Message----- > From: twisted-python-boun...@twistedmatrix.com [mailto:twisted-python- > boun...@twistedmatrix.com] On Behalf Of Werner Thie > Sent: Monday, February 22, 2010 11:39 PM > To: Twisted general discussion > Subject: Re: [Twisted-Python] debugging a memory leak > > Hi > > Assuming that if memory not released to the OS can be reused by the > interpreter because of a suballocation system used in the interpreter > should eventually lead to a leveling out of the overall memory usage > over time, that's what I observe with our processes (sitting at several > 100 MB per process). We are using external C libraries which do lots of > malloc/free and one of the bigger sources of pain is indeed to bring > such a library to a point where its clean not only by freeing all memory > allocated in every circumstance but also Python refcounting wise. I > usually go thru all the motions to build up a complete debug chain for > all modules involved in a project and write a test bed to proof clean > and proper implementation. > > So if your using C/C++ based modules in your project I would mark them > as highly suspicious to be responsible for leaks until proven otherwise. > > Not to bother you with numbers but I usually allocate about 30% of > overall project time to bring a server into a production ready state, > meaning uptimes of months/years, no fishy feelings, no performance > oscillations, predictable caving and recuperating when overloaded, just > all the things you have to tick to sign off a project as completed, > meaning you don't have to do daily 'tire kicking' maintenance and > periodic reboots. > > Werner > > Alec Matusis wrote: > > Hi Maarten, > > > > Your link > > http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i- > delete- > > a-large-object.htm > > seems to suggest that even though the interpreter does not release memory > > back to the OS, it can be re-used by the interpreter. > > If this was our problem, I'd expect the memory to be set by the highest > > usage, as opposed to it constantly leaking: in my case, the load is > > virtually constant, but the memory still leaks over time. > > > > The environment is Linux 2.6.24 x86-64, the extensions used are MySQLdb, > > pyCrypto (latest stable releases for both). > > > >> -----Original Message----- > >> From: twisted-python-boun...@twistedmatrix.com [mailto:twisted- > python- > >> boun...@twistedmatrix.com] On Behalf Of Maarten ter Huurne > >> Sent: Monday, February 22, 2010 6:24 PM > >> To: Twisted general discussion > >> Subject: Re: [Twisted-Python] debugging a memory leak > >> > >> On Tuesday 23 February 2010, Alec Matusis wrote: > >> > >>> When I start the process, both python object sizes and their counts rise > >>> proportionally to the numbers of reconnected clients, and then they > >>> stabilize after all clients have reconnected. > >>> At that moment, the "external" RSS process size is about 260MB. The > >>> "internal size" of all python objects reported by Heapy is about 150MB. > >>> After two days, the internal sizes/counts stay the same, but the > > external > >>> size grows to 1500MB. > >>> > >>> Python object counts/total sizes are measured from the manhole. > >>> Is this sufficient to conclude that this is a C memory leak in one of > > the > >>> external modules or in the Python interpreter itself? > >> In general, there are other reasons why heap size and RSS size do not > > match: > >> 1. pages are empty but not returned to the OS > >> 2. pages cannot be returned to the OS because they are not completely > > empty > >> It seems Python has different allocators for small and large objects: > >> http://www.mail-archive.com/python-l...@python.org/msg256116.html > >> http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i- > >> delete- > >> a-large-object.htm > >> > >> Assuming Python uses malloc for all its allocations (does it?), it is the > >> malloc implementation that determines whether empty pages are returned > to > >> the OS. Under Linux with glibc (your system?), empty pages are returned, > > so > >> there reason 1 does not apply. > >> > >> Depending on the allocation behaviour of Python, the pages may not be > >> empty > >> though, so reason 2 is a likely suspect. > >> > >> Python extensions written in C could also leak or fragment memory. Are > you > >> using any extensions that are not pure Python? > >> > >> Bye, > >> Maarten > >> > >> _______________________________________________ > >> Twisted-Python mailing list > >> Twisted-Python@twistedmatrix.com > >> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > > > > > _______________________________________________ > > Twisted-Python mailing list > > Twisted-Python@twistedmatrix.com > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > _______________________________________________ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python