Background: I have a client program that calls callRemote, but the Deferred that callRemote returns is not fired. This is an intermittent error that only happens after some hours of traffic.
By putting some logging into AMP, it's apparent that the server gets as far as sending the reply using BoxDispatcher._safeEmit. The original version of that ignores connection errors, but I overrode it with one that doesn't: trace_tags = False def _safeEmit(self, aBox): """ Emit a box, ignoring L{ProtocolSwitched} and L{ConnectionLost} errors which cannot be usefully handled. """ tag = None if amp.ANSWER in aBox: tag = aBox[amp.ANSWER] elif amp.ERROR in aBox: tag = aBox[amp.ERROR] if self.trace_tags and tag: log.debug('_safeEmit', tag) try: aBox._sendTo(self.boxSender) except (ProtocolSwitched, ConnectionLost): log.debug('ProtocolSwitched or ConnectionLost in _safeEmit', tag) The message in the last line is never seen, nor is there anything else in the logs on client or server to suggest that there has been any problem with the connection. On the client, BoxDispatcher._answerReceived logs all replies: def _answerReceived(self, box): """ An AMP box was received that answered a command previously sent with L{callRemote}. @param box: an AmpBox with a value for its L{ANSWER} key. """ b = box[ANSWER] question = self._outstandingRequests.pop(box[ANSWER]) question.addErrback(self.unhandledError) log.msg('_answerReceived', b, question) question.callback(box) log.msg('_answerReceived', b, 'returning') Likewise _errorReceived. I always see both log.msgs or neither, so it isn't something going wrong in the callback. The client sends a request. The server sends a reply with the same tag, and logs a message. Until it goes wrong, the client receives the reply and logs it. When it goes wrong, the client does not see the reply. The protocol has the unfired Deferred in _outstandingRequests, with the missing tag as key. All this suggests that the problem is either in the low-level network code, or somewhere in the network between the client and server. But doesn't TCP/IP tell you if a packet doesn't get through? I tried tracing the packets using a python-pycapy script (attached), but it showed more packets disappearing than actually were. That could be for several reasons, including the packet sniffer missing packets, and the _ask or _answer key not being the first thing in the box. It also made the problem go away to a large extent, though that may have been the "tail -F" I was running against the log. It does take a lot longer for the bug to manifest when tracing it. At this point, I'm stuck! If there isn't a solution to this, the only answer will be to time out and retry the command. But since TCP connections are meant to be reliable (or to give an error if they fail), I hope it won't come to that. Peter. _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python