> On 5 Aug 2015, at 1:34 am, Joshua Harlow <harlo...@outlook.com> wrote:
> 
> Philipp Marek wrote:
>>> If we end up using a DLM then we have to detect when the connection to
>>> the DLM is lost on a node and stop all ongoing operations to prevent
>>> data corruption.
>>> 
>>> It may not be trivial to do, but we will have to do it in any solution
>>> we use, even on my last proposal that only uses the DB in Volume Manager
>>> we would still need to stop all operations if we lose connection to the
>>> DB.
>> 
>> Well, is it already decided that Pacemaker would be chosen to provide HA in
>> Openstack? There's been a talk "Pacemaker: the PID 1 of Openstack" IIRC.
>> 
>> I know that Pacemaker's been pushed aside in an earlier ML post, but IMO
>> there's already *so much* been done for HA in Pacemaker that Openstack
>> should just use it.
>> 
>> All HA nodes needs to participate in a Pacemaker cluster - and if one node
>> looses connection, all services will get stopped automatically (by
>> Pacemaker) - or the node gets fenced.
>> 
>> 
>> No need to invent some sloppy scripts to do exactly the tasks (badly!) that
>> the Linux HA Stack has been providing for quite a few years.
>> 
>> 
>> Yes, Pacemaker needs learning - but not more than any other involved
>> project, and there are already quite a few here, which have to be known to
>> any operator or developer already.
>> 
>> 
>> (BTW, LINBIT sells training for the Linux HA Cluster Stack - and yes,
>>  I work for them ;)
> 
> So just a piece of information, but yahoo (the company I work for, with vms 
> in the tens of thousands, baremetal in the much more than that...) hasn't 
> used pacemaker, and in all honesty this is the first project (openstack) that 
> I have heard that needs such a solution. I feel that we really should be 
> building our services better so that they can be A-A vs having to depend on 
> another piece of software to get around our 'sloppiness' (for lack of a 
> better word).

HA is a deceptively hard problem.
There is really no need for every project to attempt to solve it on their own.
Having everyone consuming/calculating a different membership list is a very 
good way to go insane.

Aside from the usual bugs, the HA space lends itself to making simplifying 
assumptions early on, only to trap you with them down the road.
Its even worse if you’re trying to bolt it on after-the-fact...

Perhaps try to think of pacemaker as a distribute finite state machine instead 
of a cluster manager.
That is part of the value we bring to projects like galera and rabbitmq.

Sure they are A-A, and once they’re up they can survive many failures, but 
bringing them up can be non-trivial.
We also provide the additional context (eg. quorum and fencing) that allow more 
kinds of failures to be safely recovered from.

Something to think about perhaps.

— Andrew

> 
> Nothing against pacemaker personally... IMHO it just doesn't feel like we are 
> doing this right if we need such a product in the first place.
> 
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to