Hello and thanks very much for your suggestions. On monday I will send you my pacemaker configuration. At this time I monitor nova-compute systemd resource rather than ocf resource. Nova compute systemd resource is monitored very well when the pacemaker remote compute node is marked online in the cluster. If I disable the ocf remote compute node resource before rebooting the compute node, it returns online and all works fine. Simulating an unexpeced compute node reboot it is marked offline for ever. I tried pcs resource cleanup, disable and enable, unmanage and manage but it remains offline, so I think ocf remote resource is not working fine. I resolved this issue with a workaround. 1 deleting remote compute ocf resource ; pcs resource delete compute-0 2 Modifying /etc/hosts adding an an alias on all controllers: 10.102.184.90 compute-0 compute-node0 3 creating e new ocf remote resource named compute-node0 (Since Now I am at home I do not remember exactly the alias names but the above is for explaining my workaround). With this workaround it returns to work . Honestly my workaround is bad because in some months I could have e lot of aliases :-( Regards Ignazio
Il 13/Mag/2017 16:55, "Sam P" <sam47pr...@gmail.com> ha scritto: > Hi, > > This might not what exactly you are looking for... but... you may extend > this. > In Masakari [0], we use pacemaker-remote in masakari-monitors[1] to > monitor node failures. > In [1], there is hostmonitor.sh, which will gonna deprecate in next > cycle, but straightforward way to do this. > [0] https://wiki.openstack.org/wiki/Masakari > [1] https://github.com/openstack/masakari-monitors/tree/master/ > masakarimonitors/hostmonitor > > Then there is pacemaker-resources agents, > https://github.com/openstack/openstack-resource-agents/tree/master/ocf > > > I have already tried "pcs resource cleanup" but it cleans fine all > resources > > but not remote nodes. > > Anycase on monday I'll send what you requested. > Hope we can get more details on Monday. > > --- Regards, > Sampath > > > > On Sat, May 13, 2017 at 9:52 PM, Ignazio Cassano > <ignaziocass...@gmail.com> wrote: > > Thanks Curtis. > > I have already tried "pcs resource cleanup" but it cleans fine all > resources > > but not remote nodes. > > Anycase on monday I'll send what you requested. > > Regards > > Ignazio > > > > Il 13/Mag/2017 14:27, "Curtis" <serverasc...@gmail.com> ha scritto: > > > > On Fri, May 12, 2017 at 10:23 PM, Ignazio Cassano > > <ignaziocass...@gmail.com> wrote: > >> Hi Curtis, at this time I am using remote pacemaker only for controlli > ng > >> openstack services on compute nodes (neutron openvswitch-agent, > >> nova-compute, ceilometer compute). I wrote my own ansible playbooks to > >> install and configure all components. > >> Second step could be expand it for vm high availability. > >> I did not find any procedure for cleaning up compute node after > rebooting > >> and I googled a lot without luck. > > > > Can you paste some putput of something like "pcs status" and I can try > > to take a look? > > > > I've only used pacemaker a little, but I'm fairly sure it's going to > > be something like "pcs resource cleanup <resource_id>" > > > > Thanks, > > Curtis. > > > >> Regards > >> Ignazio > >> > >> Il 13/Mag/2017 00:32, "Curtis" <serverasc...@gmail.com> ha scritto: > >> > >> On Fri, May 12, 2017 at 8:51 AM, Ignazio Cassano > >> <ignaziocass...@gmail.com> wrote: > >>> Hello All, > >>> I installed openstack newton p > >>> with a pacemaker cluster made up of 3 controllers and 2 compute nodes. > >>> All > >>> computer have centos 7.3. > >>> Compute nodes are provided with remote pacemaker ocf resource. > >>> If before shutting down a compute node I disable the compute node > >>> resource > >>> in the cluster and enable it when the compute returns up, it work fine > >>> and > >>> cluster shows it online. > >>> If the compute node goes down before disabling the compute node > resource > >>> in > >>> the cluster, it remains offline also after it is powered up. > >>> The only solution I found is removing the compute node resource in the > >>> cluster and add it again with a different name (adding this new name in > >>> all > >>> controllers /etc/hosts file). > >>> With the above workaround it returns online for the cluster and all its > >>> resources (openstack-nova-compute etc etc....) return to work fine. > >>> Please, does anyone know a better solution ? > >> > >> What are you using pacemaker for on the compute nodes? I have not done > >> that personally, but my impression is that sometimes people do that in > >> order to have virtual machines restarted somewhere else should the > >> compute node go down outside of a maintenance window (ie. "instance > >> high availability"). Is that your use case? If so, I would imagine > >> there is some kind of clean up procedure to put the compute node back > >> into use when pacemaker thinks it has failed. Did you use some kind of > >> openstack distribution or follow a particular installation document to > >> enable this pacemaker setup? > >> > >> It sounds like everything is working as expected (if my guess is > >> right) and you just need the right steps to bring the node back into > >> the cluster. > >> > >> Thanks, > >> Curtis. > >> > >> > >>> Regards > >>> Ignazio > >>> > >>> > >>> _______________________________________________ > >>> OpenStack-operators mailing list > >>> OpenStack-operators@lists.openstack.org > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack-operators > >>> > >> > >> > >> > >> -- > >> Blog: serverascode.com > >> > >> > > > > > > > > -- > > Blog: serverascode.com > > > > > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators