> 2) I think you misunderstand what is the difference between > upstart/systemd and Pacemaker in this case. There are many cases when you > need to have syncrhonized view of the cluster. Otherwise you will hit > split-brain situations and have your cluster misfunctioning. Until > OpenStack provides us with such means there is no other way than using > Pacemaker/Zookeper/etc. >
Could you please give some examples of those 'many cases' for openstack specifically? As for my 'misunderstanding' - openstack services only need to be always up, not more than that. Upstart does a perfect job there. > 3) Regarding Neutron agents - we discussed it many times - you need to be > able to control and clean up stuff after some service crashed. Currently, > Neutron does not provide reliable ways to do it. If your agent dies and > does not clean up ip addresses from the network namespace you will get into > the situation of ARP duplication which will be a kind of split brain > described in item #2. I personally as a system architect and administrator > do not believe for this to change in at least several years for OpenStack > so we will be using Pacemaker for a very long period of time. > This has been changed already, and a while ago. OCF infrastructure around neutron agents has never helped neutron in any meaningful way and is just an artifact from the dark past. The reasons are: pacemaker/ocf doesn't have enough intelligence to know when to engage, as a result, any cleanup could only be achieved through manual operations. I don't need to remind you how many bugs were in ocf scripts which brought whole clusters down after those manual operations. So it's just a way better to go with simple standard tools with fine-grain control. Same applies to any other openstack service (again, not rabbitmq/galera) > so we will be using Pacemaker for a very long period of time. Not for neutron, sorry. As soon as we finish the last bit of such cleanup, which is targeted for 8.0 Now, back to the topic - we may decide to use some more sophisticated > integral node health attribute which can be used with Pacemaker as well as > to put node into some kind of maintenance mode. We can leverage User > Maintenance Mode feature here or just simply stop particular services and > disable particular haproxy backends. > I think this kind of attribute, although being analyzed by pacemaker/ocf, doesn't need any new OS service to be put under pacemaker control. Thanks, Eugene. > > On Mon, Oct 5, 2015 at 11:57 PM, Eugene Nikanorov <enikano...@mirantis.com > > wrote: > >> >>>> >>> Mirantis does control neither Rabbitmq or Galera. Mirantis cannot assure >>> their quality as well. >>> >> >> Correct, and rabbitmq was always the pain in the back, preventing any *real >> *enterprise usage of openstack where reliability does matter. >> >> >>> > 2) it has terrible UX >>>> >>> >>> It looks like personal opinion. I'd like to see surveys or operators >>> feedbacks. Also, this statement is not constructive as it doesn't have >>> alternative solutions. >>> >> >> The solution is to get rid of terrible UX wherever possible (i'm not >> saying it is always possible, of course) >> upstart is just so much better. >> And yes, this is my personal opinion and is a summary of escalation >> team's experience. >> >> >>> >>>> > 3) it is not reliable >>>> >>> >>> I would say openstack services are not HA reliable. So OCF scripts are >>> reaction of operators on these problems. Many of them have child-ish issues >>> from release to release. Operators made OCF scripts to fix these problems. >>> A lot of openstack are stateful, so they require some kind of stickiness or >>> synchronization. Openstack services doesn't have simple health-check >>> functionality so it's hard to say it's running well or not. Sighup is still >>> a problem for many of openstack services. Etc/etc So, let's be constructive >>> here. >>> >> >> Well, I prefer to be responsible for what I know and maintain. Thus, I >> state that neutron doesn't need to be managed by pacemaker, neither server, >> nor all kinds of agents, and that's the path that neutron team will be >> taking. >> >> Thanks, >> Eugene. >> >>> >>> >>>> > >>>> >>>> I disagree with #1 as I do not agree that should be a criteria for an >>>> open-source project. Considering pacemaker is at the core of our >>>> controller setup, I would argue that if these are in fact true we need >>>> to be using something else. I would agree that it is a terrible UX >>>> but all the clustering software I've used fall in this category. I'd >>>> like more information on how it is not reliable. Do we have numbers to >>>> backup these claims? >>>> >>>> > (3) is not evaluation of the project itself, but just a logical >>>> consequence >>>> > of (1) and (2). >>>> > As a part of escalation team I can say that it has cost our team >>>> thousands >>>> > of man hours of head-scratching, staring at pacemaker logs which >>>> value are >>>> > usually slightly below zero. >>>> > >>>> > Most of openstack services (in fact, ALL api servers) are stateless, >>>> they >>>> > don't require any cluster management (also, they don't need to be >>>> moved in >>>> > case of lack of space). >>>> > Statefull services like neutron agents have their states being a >>>> function of >>>> > db state and are able to syncronize it with the server without >>>> external >>>> > "help". >>>> > >>>> >>>> So it's not an issue with moving services so much as being able to >>>> stop the services when a condition is met. Have we tested all OS >>>> services to ensure they do function 100% when out of disk space? I >>>> would assume that glance might have issues with image uploads if there >>>> is no space to handle a request. >>>> >>>> > So now usage of pacemaker can be only justified for cases where >>>> service's >>>> > clustering mechanism requires active monitoring (rabbitmq, galera) >>>> > But even there, examples when we are better off without pacemaker are >>>> all >>>> > around. >>>> > >>>> > Thanks, >>>> > Eugene. >>>> > >>>> >>>> After I sent this email, I had further discussions around the issues >>>> that I'm facing and it may not be completely related to disk space. I >>>> think we might be relying on the expectation that the local rabbitmq >>>> is always available but I need to look into that. Either way, I >>>> believe we still should continue to discuss this issue as we are >>>> managing services in multiple ways on a single host. Additionally I do >>>> not believe that we really perform quality health checks on our >>>> services. >>>> >>>> Thanks, >>>> -Alex >>>> >>>> >>>> > >>>> > On Mon, Oct 5, 2015 at 1:34 PM, Sergey Vasilenko < >>>> svasile...@mirantis.com> >>>> > wrote: >>>> >> >>>> >> >>>> >> On Mon, Oct 5, 2015 at 12:22 PM, Eugene Nikanorov >>>> >> <enikano...@mirantis.com> wrote: >>>> >>> >>>> >>> No pacemaker for os services, please. >>>> >>> We'll be moving out neutron agents from pacemaker control in 8.0, >>>> other >>>> >>> os services don't need it too. >>>> >> >>>> >> >>>> >> could you please provide your arguments. >>>> >> >>>> >> >>>> >> /sv >>>> >> >>>> >> >>>> __________________________________________________________________________ >>>> >> OpenStack Development Mailing List (not for usage questions) >>>> >> Unsubscribe: >>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >> >>>> > >>>> > >>>> > >>>> __________________________________________________________________________ >>>> > OpenStack Development Mailing List (not for usage questions) >>>> > Unsubscribe: >>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> > >>>> >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: >>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>> >>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > > -- > Yours Faithfully, > Vladimir Kuklin, > Fuel Library Tech Lead, > Mirantis, Inc. > +7 (495) 640-49-04 > +7 (926) 702-39-68 > Skype kuklinvv > 35bk3, Vorontsovskaya Str. > Moscow, Russia, > www.mirantis.com <http://www.mirantis.ru/> > www.mirantis.ru > vkuk...@mirantis.com > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev