Nodes can become unavailable with Shovel or Federation as well. While both plugins will enqueue undelivered/unconfirmed messages internally, recovery can take a while or never happen. So replication is certainly necessary.
RabbitMQ has a guide that mentions several possible failure scenarios: http://www.rabbitmq.com/reliability.html Note that a lot of them do not even involve a messaging server, and would be just as relevant for Pacemaker setups. This is as much an oslo.messaging design concern as whatever messaging technology is used. There's ongoing work on publisher confirms — one of the things oslo.messaging must have — and heartbeats, for faster peer unavailability detection. Shovel or Federation or AP-style mirroring wouldn't change this. So please clarify what problems are being tackled here. Currently there are several largely unrelated things mentioned: rabbitmqctl timeouts, Fuel provisioning, Mnesia being very consistency-oriented, desired oslo.messaging fault tolerance improvements. I'm not sure how some of these relate to each other and why OpenStack has to work around issues that should be reported to the RabbitMQ team. I will push for introducing the most basic timeout support in ctl in the next bug fix release. On Mon, Jun 8, 2015 at 5:24 PM, Bogdan Dobrelya <[email protected]> wrote: > > RabbitMQ team member here. > > Thank you for a quick response, Michael! > > > > > Neither Shovel nor Federation will replace mirroring. Shovel moves > messages > > from a queue to an exchange (within a single node or between remote > nodes and/or clusters). > > It doesn't replicate anything. > > Yes, the idea was to not just replace, but redesign OpenStack libs to > use cluster-less messaging as well. It should assume that some messages > from RPC conversations may be lost. And that messages aren't synced > between different AMQP nodes specified in the config of OpenStack > services (rabbit_hosts=). > > > > > Federation has two parts to it: > > > > * Queue federation: no replicate, distributes messages from a single > logical queue > > between N nodes or clusters, when there are no local consumers to > consume them. > > * Exchange federation replicates a stream of messages going through an > exchange. > > As messages are consumed upstream, downstream has no way of knowing > about it. > > > > > > The right thing to do here is introduce timeouts to rabbitmqctl, which > was 99% finished > > in the past but some RabbitMQ team members felt it should produce more > detailed > > error messages, which extended the scope of the change significantly. > > > > > > While Mnesia indeed needs to be replaced to introduce AP (as in CAP) > style mirroring, > > the issue you're bringing up here has nothing to do with Mnesia. > > Mnesia is not used by rabbitmqctl, and it is not used to store messages. > > It's a rabbitmqctl > > issue, and potentially a hint that you may want to reduce net_ticktime > value (say, to 5-10 seconds) > > to make queue master unavailability detected faster. > > > > > > Thank you, I updated the bug comments [0]. We will test this option as > well. > > [0] https://bugs.launchpad.net/fuel/+bug/1460762/comments/23 > > > > > 1. http://www.rabbitmq.com/nettick.html > > -- > > MK > > > > Staff Software Engineer, Pivotal/RabbitMQ > > > -- > Best regards, > Bogdan Dobrelya, > Skype #bogdando_at_yahoo.com > Irc #bogdando > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- MK Staff Software Engineer, Pivotal/RabbitMQ
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
