If you are using ha queues, use a version of rabbitmq > 3.3.0. There was a change in that version where consumption on queues was automatically enabled when a master election for a queue happened. Previous versions only informed clients that they had to reconsume on a queue. It was the clients responsibility to start consumption on a queue.
Make sure you enable tcp keepalives to a low enough value in case you have a firewall device in between your rabbit server and it's consumers. Monitor consumers on your rabbit infrastructure using 'rabbitmqctl list_queues name messages consumers'. Consumers on fanout queues is going to depend on the number of services of any type you have in your environment. Sri On Jan 15, 2015 6:27 PM, "Michael Dorman" <mdor...@godaddy.com> wrote: > Here is the bug I’ve been tracking related to this for a while. I > haven’t really kept up to speed with it, so I don’t know the current status. > > https://bugs.launchpad.net/nova/+bug/856764 > > > From: Kris Lindgren <klindg...@godaddy.com> > Date: Thursday, January 15, 2015 at 12:10 PM > To: Gustavo Randich <gustavo.rand...@gmail.com>, OpenStack Operators < > openstack-operators@lists.openstack.org> > Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq > connectivity > > During the Atlanta ops meeting this topic came up and I specifically > mentioned about adding a "no-op" or healthcheck ping to the rabbitmq stuff > to both nova & neutron. The dev's in the room looked at me like I was > crazy, but it was so that we could exactly catch issues as you described. > I am also interested if any one knows of a lightweight call that could be > used to verify/confirm rabbitmq connectivity as well. I haven't been able > to devote time to dig into it. Mainly because if one client is having > issues - you will notice other clients are having similar/silent errors and > a restart of all the things is the easiest way to fix, for us atleast. > ____________________________________________ > > Kris Lindgren > Senior Linux Systems Engineer > GoDaddy, LLC. > > > From: Gustavo Randich <gustavo.rand...@gmail.com> > Date: Thursday, January 15, 2015 at 11:53 AM > To: "openstack-operators@lists.openstack.org" < > openstack-operators@lists.openstack.org> > Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq > connectivity > > Just to add one more background scenario, we also had similar problems > trying to load balance rabbitmq via F5 Big IP LTM. For that reason we don't > use it now. Our installation is a single rabbitmq instance and no > intermediaries (albeit network switches). We use Folsom and Icehouse, the > problem being perceived more in Icehouse nodes. > > We are already monitoring message queue size, but we would like to > pinpoint in semi-realtime the specific hosts/racks/network paths > experiencing the "stale connection" before a user complains about an > operation being stuck, or even hosts with no such pending operations but > already "disconnected" -- we also could diagnose possible network causes > and avoid massive service restarting. > > So, for now, if someone knows about a cheap and quick openstack > operation that triggers a message interchange between rabbitmq and > nova-compute and a way of checking the result it would be great. > > > > > On Thu, Jan 15, 2015 at 1:45 PM, Kris G. Lindgren <klindg...@godaddy.com> > wrote: > >> We did have an issue using celery on an internal application that we >> wrote - but I believe it was fixed after much failover testing and code >> changes. We also use logstash via rabbitmq and haven't noticed any issues >> there either. >> >> So this seems to be just openstack/oslo related. >> >> We have tried a number of different configurations - all of them had >> their issues. We started out listing all the members in the cluster on the >> rabbit_hosts line. This worked most of the time without issue, until we >> would restart one of the servers, then it seemed like the clients wouldn't >> figure out they were disconnected and reconnect to the next host. >> >> In an attempt to solve that we moved to using harpoxy to present a vip >> that we configured in the rabbit_hosts line. This created issues with long >> lived connections disconnects and a bunch of other issues. In our >> production environment we moved to load balanced rabbitmq, but using a real >> loadbalancer, and don’t have the weird disconnect issues. However, anytime >> we reboot/take down a rabbitmq host or pull a member from the cluster we >> have issues, or if their is a network disruption we also have issues. >> >> Thinking the best course of action is to move rabbitmq off on to its >> own box and to leave it alone. >> >> Does anyone have a rabbitmq setup that works well and doesn’t have >> random issues when pulling nodes for maintenance? >> ____________________________________________ >> >> Kris Lindgren >> Senior Linux Systems Engineer >> GoDaddy, LLC. >> >> >> From: Joe Topjian <j...@topjian.net> >> Date: Thursday, January 15, 2015 at 9:29 AM >> To: "Kris G. Lindgren" <klindg...@godaddy.com> >> Cc: "openstack-operators@lists.openstack.org" < >> openstack-operators@lists.openstack.org> >> Subject: Re: [Openstack-operators] Way to check compute <-> rabbitmq >> connectivity >> >> Hi Kris, >> >> Our experience is pretty much the same on anything that is using >>> rabbitmq - not just nova-compute. >>> >> >> Just to clarify: have you experienced this outside of OpenStack (or >> Oslo)? >> >> We've seen similar issues with rabbitmq and OpenStack. We used to run >> rabbit through haproxy and tried a myriad of options like setting no >> timeouts, very very long timeouts, etc, but would always eventually see >> similar issues as described. >> >> Last month, we reconfigured all OpenStack components to use the >> `rabbit_hosts` option with all nodes in our cluster listed. So far this has >> worked well, though I probably just jinxed myself. :) >> >> We still have other services (like Sensu) using the same rabbitmq >> cluster and accessing it through haproxy. We've never had any issues there. >> >> What's also strange is that I have another OpenStack deployment (from >> Folsom to Icehouse) with just a single rabbitmq server installed directly >> on the cloud controller (meaning: no nova-compute). I never have any rabbit >> issues in that cloud. >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators