In the meantime, I'm using this horrendous script inside compute nodes to check for rabbitmq connectivity. It uses the 'set_host_enabled' rpc call, which in my case is innocuous.
#!/bin/bash UUID=$(cat /proc/sys/kernel/random/uuid) RABBIT=$(grep -Po '(?<=rabbit_host = ).+' /etc/nova/nova.conf) HOSTX=$(hostname) python -c " import pika connection = pika.BlockingConnection(pika.ConnectionParameters(\"$RABBIT\")) channel = connection.channel() channel.basic_publish(exchange='nova', routing_key=\"compute.$HOSTX\", properties=pika.BasicProperties(content_type = 'application/json'), body = '{ \"version\": \"3.0\", \"_context_request_id\": \"$UUID\", \\ \"_context_roles\": [\"KeystoneAdmin\", \"KeystoneServiceAdmin\", \"admin\"], \\ \"_context_user_id\": \"XXX\", \\ \"_context_project_id\": \"XXX\", \\ \"method\": \"set_host_enabled\", \\ \"args\": {\"enabled\": true} \\ }' ) connection.close()" sleep 2 tail -1000 /var/log/nova/nova-compute.log | grep -q $UUID || { echo "WARNING: nova-compute not consuming RabbitMQ messages. Last message: $UUID"; exit 1; } echo "OK" On Thu, Jan 15, 2015 at 9:48 PM, Sam Morrison <sorri...@gmail.com> wrote: > We've had a lot of issues with Icehouse related to rabbitMQ. Basically the > change from openstack.rpc to oslo.messaging broke things. These things are > now fixed in oslo.messaging version 1.5.1, there is still an issue with > heartbeats and that patch is making it's way through review process now. > > https://review.openstack.org/#/c/146047/ > > Cheers, > Sam > > > On 16 Jan 2015, at 10:55 am, sridhar basam <sridhar.ba...@gmail.com> > wrote: > > > If you are using ha queues, use a version of rabbitmq > 3.3.0. There was a > change in that version where consumption on queues was automatically > enabled when a master election for a queue happened. Previous versions only > informed clients that they had to reconsume on a queue. It was the clients > responsibility to start consumption on a queue. > > Make sure you enable tcp keepalives to a low enough value in case you have > a firewall device in between your rabbit server and it's consumers. > > Monitor consumers on your rabbit infrastructure using 'rabbitmqctl > list_queues name messages consumers'. Consumers on fanout queues is going > to depend on the number of services of any type you have in your > environment. > > Sri > On Jan 15, 2015 6:27 PM, "Michael Dorman" <mdor...@godaddy.com> wrote: > >> Here is the bug I've been tracking related to this for a while. I >> haven't really kept up to speed with it, so I don't know the current status. >> >> https://bugs.launchpad.net/nova/+bug/856764 >> >> >> From: Kris Lindgren <klindg...@godaddy.com> >> Date: Thursday, January 15, 2015 at 12:10 PM >> To: Gustavo Randich <gustavo.rand...@gmail.com>, OpenStack Operators < >> openstack-operators@lists.openstack.org> >> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq >> connectivity >> >> During the Atlanta ops meeting this topic came up and I specifically >> mentioned about adding a "no-op" or healthcheck ping to the rabbitmq stuff >> to both nova & neutron. The dev's in the room looked at me like I was >> crazy, but it was so that we could exactly catch issues as you described. >> I am also interested if any one knows of a lightweight call that could be >> used to verify/confirm rabbitmq connectivity as well. I haven't been able >> to devote time to dig into it. Mainly because if one client is having >> issues - you will notice other clients are having similar/silent errors and >> a restart of all the things is the easiest way to fix, for us atleast. >> ____________________________________________ >> >> Kris Lindgren >> Senior Linux Systems Engineer >> GoDaddy, LLC. >> >> >> From: Gustavo Randich <gustavo.rand...@gmail.com> >> Date: Thursday, January 15, 2015 at 11:53 AM >> To: "openstack-operators@lists.openstack.org" < >> openstack-operators@lists.openstack.org> >> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq >> connectivity >> >> Just to add one more background scenario, we also had similar >> problems trying to load balance rabbitmq via F5 Big IP LTM. For that reason >> we don't use it now. Our installation is a single rabbitmq instance and no >> intermediaries (albeit network switches). We use Folsom and Icehouse, the >> problem being perceived more in Icehouse nodes. >> >> We are already monitoring message queue size, but we would like to >> pinpoint in semi-realtime the specific hosts/racks/network paths >> experiencing the "stale connection" before a user complains about an >> operation being stuck, or even hosts with no such pending operations but >> already "disconnected" -- we also could diagnose possible network causes >> and avoid massive service restarting. >> >> So, for now, if someone knows about a cheap and quick openstack >> operation that triggers a message interchange between rabbitmq and >> nova-compute and a way of checking the result it would be great. >> >> >> >> >> On Thu, Jan 15, 2015 at 1:45 PM, Kris G. Lindgren <klindg...@godaddy.com> >> wrote: >> >>> We did have an issue using celery on an internal application that we >>> wrote - but I believe it was fixed after much failover testing and code >>> changes. We also use logstash via rabbitmq and haven't noticed any issues >>> there either. >>> >>> So this seems to be just openstack/oslo related. >>> >>> We have tried a number of different configurations - all of them had >>> their issues. We started out listing all the members in the cluster on the >>> rabbit_hosts line. This worked most of the time without issue, until we >>> would restart one of the servers, then it seemed like the clients wouldn't >>> figure out they were disconnected and reconnect to the next host. >>> >>> In an attempt to solve that we moved to using harpoxy to present a vip >>> that we configured in the rabbit_hosts line. This created issues with long >>> lived connections disconnects and a bunch of other issues. In our >>> production environment we moved to load balanced rabbitmq, but using a real >>> loadbalancer, and don't have the weird disconnect issues. However, anytime >>> we reboot/take down a rabbitmq host or pull a member from the cluster we >>> have issues, or if their is a network disruption we also have issues. >>> >>> Thinking the best course of action is to move rabbitmq off on to its >>> own box and to leave it alone. >>> >>> Does anyone have a rabbitmq setup that works well and doesn't have >>> random issues when pulling nodes for maintenance? >>> ____________________________________________ >>> >>> Kris Lindgren >>> Senior Linux Systems Engineer >>> GoDaddy, LLC. >>> >>> >>> From: Joe Topjian <j...@topjian.net> >>> Date: Thursday, January 15, 2015 at 9:29 AM >>> To: "Kris G. Lindgren" <klindg...@godaddy.com> >>> Cc: "openstack-operators@lists.openstack.org" < >>> openstack-operators@lists.openstack.org> >>> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq >>> connectivity >>> >>> Hi Kris, >>> >>> Our experience is pretty much the same on anything that is using >>>> rabbitmq - not just nova-compute. >>>> >>> >>> Just to clarify: have you experienced this outside of OpenStack (or >>> Oslo)? >>> >>> We've seen similar issues with rabbitmq and OpenStack. We used to run >>> rabbit through haproxy and tried a myriad of options like setting no >>> timeouts, very very long timeouts, etc, but would always eventually see >>> similar issues as described. >>> >>> Last month, we reconfigured all OpenStack components to use the >>> `rabbit_hosts` option with all nodes in our cluster listed. So far this has >>> worked well, though I probably just jinxed myself. :) >>> >>> We still have other services (like Sensu) using the same rabbitmq >>> cluster and accessing it through haproxy. We've never had any issues there. >>> >>> What's also strange is that I have another OpenStack deployment (from >>> Folsom to Icehouse) with just a single rabbitmq server installed directly >>> on the cloud controller (meaning: no nova-compute). I never have any rabbit >>> issues in that cloud. >>> >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >>> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators