I have been operating a busy twisted TCP server for quite some time (1 year+).
Yesterday, it strangely crashed under peak load for that day (it has been under more load in the past). # grep segfault /var/log/messages Mar 20 19:12:15 serv2 kernel: [17687209.144548] twistd[10701]: segfault at 8 rip 41eccf rsp 7fff33675270 error 6 I cannot understand the reason of the crash, except that its some very rare race condition. While the code is quite cumbersome, here are some general details: 8 min before the crash, the server started giving repeated error on clients disconnect, that I have never seen before in the last year. It first produced this twice: 2009/03/20 19:04 -0700 [ClientProtocol,318930,70.132.227.150] Unhandled Error Traceback (most recent call last): File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/application/a pp.py", line 113, in runReactorWithLogging reactor.run() File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi xbase.py", line 220, in run self.mainLoop() File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi xbase.py", line 231, in mainLoop self.doIteration(t) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol lreactor.py", line 181, in doPoll log.callWithLogger(selectable, _drdw, selectable, fd, event) --- <exception caught here> --- File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py ", line 48, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py ", line 33, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex t.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex t.py", line 37, in callWithContext return func(*args,**kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol lreactor.py", line 212, in _doReadOrWrite self._disconnectSelectable(selectable, why, inRead) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi xbase.py", line 252, in _disconnectSelectable selectable.readConnectionLost(f) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp. py", line 405, in readConnectionLost self.connectionLost(reason) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp. py", line 416, in connectionLost protocol.connectionLost(reason) File "/nail/live/im/myserv/protocols.py", line 170, in connectionLost self.session.logout() File "/nail/live/im/myserv/service.py", line 1199, in logout if self.nick in session.sellers: exceptions.AttributeError: Deferred instance has no attribute 'sellers' and then a different error in the same line with the same call stack, that was repeated about 1000 times for different logging out clients: 2009/03/20 19:04 -0700 [ClientProtocol,326158,99.245.246.187] Unhandled Error Traceback (most recent call last): File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/application/a pp.py", line 113, in runReactorWithLogging reactor.run() File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi xbase.py", line 220, in run self.mainLoop() File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi xbase.py", line 231, in mainLoop self.doIteration(t) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol lreactor.py", line 181, in doPoll log.callWithLogger(selectable, _drdw, selectable, fd, event) --- <exception caught here> --- File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py ", line 48, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py ", line 33, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex t.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex t.py", line 37, in callWithContext return func(*args,**kw) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol lreactor.py", line 212, in _doReadOrWrite self._disconnectSelectable(selectable, why, inRead) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi xbase.py", line 252, in _disconnectSelectable selectable.readConnectionLost(f) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp. py", line 405, in readConnectionLost self.connectionLost(reason) File "/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp. py", line 416, in connectionLost protocol.connectionLost(reason) File "/nail/live/im/myserv/protocols.py", line 170, in connectionLost self.session.logout() File "/nail/live/im/myserv/service.py", line 1199, in logout if self.nick in session.sellers: exceptions.AttributeError: WLResultSet instance has no attribute 'sellers' This last error repeated about 1000 times for different clients, after which the segfault occurred. The problem is, that session is an instance of a Session class that is defined in the code, it is NEITHER a Deferred NOR an instance of WLResultSet class- these are completely different classes, and theres no possible way in the code for variable session to become an instance of Deferred. The code is approximately like this: def logout(self): self.on_logout(self.do_logout) for session in self.service.client_sessions: ß self.service.client_sessions is a dictionary of custom Session class instances if self.nick in session.sellers: ß error here, it claims that session is a Deferred or sometimes an instance of a completely unrelated class #Some statements The function self.on_logout sometimes (rarely) dispatches a deferred (it has maybeDeferred in it), such that its argument self.on_logout is called when that deferred returns (otherwise its called synchronously). My question is, is it possible to muck up deferred in such a way, that after dispatching a deferred chain inside self.on_logout(self.do_logout) , the remaining statements would be executed in some unexpected local scope, such that I would see these weird errors where my for loop iterator would suddenly go over Deferred instances, instead of the dictionary of Session instances that I define in the code ?? Also, why did these errors lead to segfault of the whole twisted process after 8 minutes?
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python