On Sat, 08 Aug 2009 16:32 +0000, exar...@twistedmatrix.com wrote: > On 5 Aug, 03:59 pm, peter.westl...@pobox.com wrote: > >Background: I have a client program that calls callRemote, but the > >Deferred that callRemote returns is not fired. This is an intermittent > >error that only happens after some hours of traffic. ... > >The client sends a request. > >The server sends a reply with the same tag, and logs a message. > >Until it goes wrong, the client receives the reply and logs it. > >When it goes wrong, the client does not see the reply. > > > >The protocol has the unfired Deferred in _outstandingRequests, > >with the missing tag as key. > > > >All this suggests that the problem is either in the low-level > >network code, or somewhere in the network between the client > >and server. But doesn't TCP/IP tell you if a packet doesn't > >get through? > > It doesn't really /tell/ you. But if there is some issue with > the network that prevents packet delivery for long enough, then > the connection breaks (giving you a connectionLost call in Twisted, > which should errback your Deferred). > > > >I tried tracing the packets using a python-pycapy script (attached), > >but it showed more packets disappearing than actually were. > > It might be simpler to use something like wireshark, which already > knows how to do TCP stream re-assembly and such. > > I know this isn't much of a suggestion, but it's the only thing that > really comes to mind.
No, it's a good suggestion - the only problem is that it makes the problem happen far less often! What I've done is to capture the first few bytes of each packet, enough to show the whole of the AMP reply packets (which are just {'_answer':tag, 'status':True}) without any need for full-scale analysis and reassembly. I'm pleased to say that after many hours of watching tcpdump, the bug eventually happened again just minutes before I left for a long weekend. Sure enough, TCP/IP delivers the packet successfully. Tomorrow's first task is going to be putting tracing into the low-level code where the reactor is meant to receive the packet and call the AMP _outstandingRequests entry for it. It would have been almost impossible to get even this far without the Manhole feature. I've been using it all along to inspect the workings of the client code, and it was a big help on the server side too. The server handles a lot of connections, and I was able to set a debug flag in the protocol object for the connection I was tracing. There would have been very much more log data otherwise. It makes a big difference being able to see values of variables without having to add new log statements and spend another working day waiting for the bug. Whoever invented it - thank you! Peter. _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python