On Thu, Oct 16, 2014 at 11:01 AM, Thomas Herve <thomas.he...@enovance.com> wrote: > >> >> This still doesn't do away with the requirement to reliably detect >> >> node failure, and to fence misbehaving nodes. Detecting that a node >> >> has failed, and fencing it if unsure, is a prerequisite for any >> >> recovery action. So you need Corosync/Pacemaker anyway. >> > >> > Obviously, yes. My post covered all of that directly ... the tagging >> > bit was just additional input into the recovery operation. >> >> This is essentially why I am saying using the Pacemaker stack is the >> smarter approach than hacking something into Ceilometer and Heat. You >> already need Pacemaker for service availability (and all major vendors >> have adopted it for that purpose), so a highly available cloud that >> does *not* use Pacemaker at all won't be a vendor supported option for >> some time. So people will already be running Pacemaker — then why not >> use it for what it's good at? > > I may be missing something, but Pacemaker will only provide monitoring of > your compute node, right? I think the advantage you would get by using > something like Heat is having an instance agent and provide monitoring of > your client service, instead of just knowing the status of your hypervisor. > Hosts can fail, but there is another array of failures that you can't handle > with the global deployment monitoring.
You *are* missing something, indeed. :) Pacemaker would be a perfectly fine tool for also monitoring the status of your guests on the hosts. So arguably, nova-compute could in fact hook in with pcsd (https://github.com/feist/pcs/tree/master/pcs -- all in Python) down the road to inject VM monitoring into the Pacemaker configuration. This would, of course, need to be specific to the hypervisor so it would be a job for the nova driver, rather than being implemented at the nova-compute level. But my hunch is that that sort of thing would be for the L release; for Kilo the low-hanging fruit would be to defend against host failure (meaning, compute node failure, unrecoverable nova-compute service failure, etc.). Cheers, Florian _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev