We have a master-slave setup for Redis, running 6 instances of Redis on each physical host, and one floating IP between them.
Each redis instance is part of a single group. When we fail over the IP in production, I'm observing this sequence of events: Pacemaker brings down the floating IP Pacemaker demotes the master redis instance Pacemaker stops each running redis process in sequence (essentially stopping the group) Pacemaker promotes the slave Pacemaker brings up the floating IP on the former slave (This follows documented behaviour as I understand it, see http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg05344.html for someone else with a similar problem). Under production traffic load, each redis process takes about 4 to 5 seconds to sync to disk and cleanup. This means that a simple failover takes between 24 and 30 seconds, which is a bit too long for us. Acceptable failover times would be less than 5 seconds (the lower the better). Is there a configuration option to change the failover process to *not* stop the group before promoting the secondary? Alternatively, suggestions on how to get pacemaker to manage only the state of the redis process but not the process itself are welcome (A process failure can be diagnosed by monitoring the response or lack thereof from redis itself, so a dead or non responding process can be treated alike as far as monitoring it goes). Devdas Bhagat _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org