Hi Gordon thanks for your reply. That was exactly the problem with my example: no acknowledge meant the old messages were stuck in the queue, leading to no rpc reply. I created my test program from oslo.messaging/tests/test_rabbit.py, which didn't have any calls to acknowledge(). The thing is, that produces errors exactly like what I'm seeing in nova if rabbit dies and we reconnect to a new rabbit instance. I'm tracing through the nova calls in the rabbit reconnect case to confirm that acknowledge is always being called when it should be.
Cheers, -- Noel On Mon, Jul 7, 2014 at 3:43 PM, Gordon Sim <g...@redhat.com> wrote: > On 07/06/2014 01:02 AM, Noel Burton-Krahn wrote: > >> Icehouse >> oslo-messaging 1.3.0 >> rabbitmq-server 3.1.3 >> >> We've noticed that nova rpc calls fail often after rabbit restarts. >> I've tracked it down to oslo/rabbit/kombu timing out if it's forced to >> reconnect to rabbit. The code below times out waiting for a reply if >> the topic has been used in a previous run. The reply always arrives the >> first time a topic is used, or if the topic is none. But, the second >> run with the same topic will hang with this error: >> >> MessagingTimeout: Timed out waiting for a reply to message ID ... >> >> >> This problem seems too basic to not be caught earlier in oslo, but the >> program below does really reproduce the same symptoms we see in nova >> when run against a live rabbit server. What's wrong with this picture? >> > > Just a theory, but could the issue with the simple example be the > following: > > * the same queue is used for the first and second run > * the first request is not acknowledged so when the first test exits its > left on the queue > * on the second attempt, you retrieve the same first request, whose > reply-to address is no longer valid so the reply is never delivered > * you then try to join the sender thread without pulling off another > message, so you don't get to the second request > > Just a theory as I say. Also doesn't explain the actual issue as you > observed with nova. Its just a property of this example. > > --Gordon. > > Cheers >> -- >> Noel >> >> >> #! /usr/bin/python >> >> from oslo.config import cfg >> import threading >> from oslo import messaging >> import logging >> import time >> log = logging.getLogger(__name__) >> >> class OsloTest(): >> def test(self): >> # The code below times out waiting for a reply if the topic >> # has been used in a previous run. The reply always arrives >> # the first time a topic is used, or if the topic is none. >> # But, the second run with the same topic will hang with this >> # error: >> # >> # MessagingTimeout: Timed out waiting for a reply to message ID >> ... >> # >> topic = 'will_hang_on_second_usage' >> #topic = None # never hangs >> >> url = "%(proto)s://%(user)s:%(password)s@%(host)s/" % dict( >> proto = 'rabbit', >> host = '1.2.3.4', >> password = 'xxxxxxxx', >> user = 'rabbit-mq-user', >> ) >> transport = messaging.get_transport(cfg.CONF, url) >> driver = transport._driver >> >> target = messaging.Target(topic=topic) >> listener = driver.listen(target) >> ctxt={"context": True} >> timeout = 10 >> >> def send_main(): >> log.debug("sending msg") >> reply = driver.send(target, >> ctxt, >> {'send': 1}, >> wait_for_reply=True, >> timeout=timeout) >> >> # times out if topic was not None and used before >> log.debug("received reply=%r" % (reply,)) >> >> send_thread = threading.Thread(target=send_main) >> send_thread.daemon = True >> send_thread.start() >> >> msg = listener.poll() >> log.debug("received msg=%r" % (msg,)) >> >> msg.reply({'reply': 1}) >> >> log.debug("sent reply") >> >> send_thread.join() >> >> if __name__ == '__main__': >> FORMAT = '%(asctime)-15s %(process)5d %(thread)5d %(filename)s >> %(funcName)s %(message)s' >> logging.basicConfig(level=logging.DEBUG, format=FORMAT) >> OsloTest().test() >> >> >> _______________________________________________ >> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/ >> openstack >> Post to : openstack@lists.openstack.org >> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/ >> openstack >> >> > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack