Re: [openstack-dev] [Nova] Automatic evacuate

Florian Haas Thu, 16 Oct 2014 02:28:41 -0700

On Thu, Oct 16, 2014 at 11:01 AM, Thomas Herve
<[email protected]> wrote:
>
>> >> This still doesn't do away with the requirement to reliably detect
>> >> node failure, and to fence misbehaving nodes. Detecting that a node
>> >> has failed, and fencing it if unsure, is a prerequisite for any
>> >> recovery action. So you need Corosync/Pacemaker anyway.
>> >
>> > Obviously, yes.  My post covered all of that directly ... the tagging
>> > bit was just additional input into the recovery operation.
>>
>> This is essentially why I am saying using the Pacemaker stack is the
>> smarter approach than hacking something into Ceilometer and Heat. You
>> already need Pacemaker for service availability (and all major vendors
>> have adopted it for that purpose), so a highly available cloud that
>> does *not* use Pacemaker at all won't be a vendor supported option for
>> some time. So people will already be running Pacemaker — then why not
>> use it for what it's good at?
>
> I may be missing something, but Pacemaker will only provide monitoring of 
> your compute node, right? I think the advantage you would get by using 
> something like Heat is having an instance agent and provide monitoring of 
> your client service, instead of just knowing the status of your hypervisor. 
> Hosts can fail, but there is another array of failures that you can't handle 
> with the global deployment monitoring.


You *are* missing something, indeed. :) Pacemaker would be a perfectly
fine tool for also monitoring the status of your guests on the hosts.
So arguably, nova-compute could in fact hook in with pcsd
(https://github.com/feist/pcs/tree/master/pcs -- all in Python) down
the road to inject VM monitoring into the Pacemaker configuration.
This would, of course, need to be specific to the hypervisor so it
would be a job for the nova driver, rather than being implemented at
the nova-compute level.

But my hunch is that that sort of thing would be for the L release;
for Kilo the low-hanging fruit would be to defend against host failure
(meaning, compute node failure, unrecoverable nova-compute service
failure, etc.).

Cheers,
Florian

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Automatic evacuate

Reply via email to