Sorry! Pacemaker 1.1.10 Corosync 2.3.30
BTW I removed quorum.two_node:1 from corosync.conf, and it helped! Now isolated node stops its services in 3-node cluster. Was it the right solution? On Wednesday, January 14, 2015, Andrew Beekhof <and...@beekhof.net> wrote: > > > On 14 Jan 2015, at 12:06 am, Dmitry Koterov <dmitry.kote...@gmail.com > <javascript:;>> wrote: > > > > > > > Then I see that, although node2 clearly knows it's isolated (it > doesn't see other 2 nodes and does not have quorum) > > > > we don't know that - there are several algorithms for calculating quorum > and the information isn't included in your output. > > are you using cman, or corosync underneath pacemaker? corosync version? > pacemaker version? have you set no-quorum-policy? > > > > no-quorum-policy is not set, so, according to > http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html > , it is "stop - stop all resources in the affected cluster parition". I > suppose this is the right option, but why the resources are not stopped on > the node when this one node of three becomes isolated and the node clearly > sees other nodes as offline (so it knows it's isolated)? What should I > configure in addition? > > > > I'm using corosync+pacemaker, no cman. Below (in quotes) is output of > "crm configure show". Versions are from Ubuntu 14.04, so almost new. > > I don't have Ubuntu installed. You'll have to be more specific as to what > package versions you have. > > > > > > > > , it does not stop its services: > > > > > > root@node2:~# crm status > > > Online: [ node2 ] > > > OFFLINE: [ node1 node3 ] > > > Master/Slave Set: ms_drbd [drbd] > > > Masters: [ node2 ] > > > Stopped: [ node1 node3 ] > > > Resource Group: server > > > fs (ocf::heartbeat:Filesystem): Started node2 > > > postgresql (lsb:postgresql): Started node2 > > > bind9 (lsb:bind9): Started node2 > > > nginx (lsb:nginx): Started node2 > > > > > > So is there a way to say pacemaker to shutdown nodes' services when > they become isolated? > > > > > > > > > > > > On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvos...@redhat.com > <javascript:;>> wrote: > > > > > > > > > ----- Original Message ----- > > > > Hello. > > > > > > > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and > Node2 are > > > > DRBD master-slave, also they have a number of other services > installed > > > > (postgresql, nginx, ...). Node3 is just a corosync node (for > quorum), no > > > > DRBD/postgresql/... are installed at it, only corosync+pacemaker. > > > > > > > > But when I add resources to the cluster, a part of them are somehow > moved to > > > > node3 and since then fail. Note than I have a "colocation" directive > to > > > > place these resources to the DRBD master only and "location" with > -inf for > > > > node3, but this does not help - why? How to make pacemaker not run > anything > > > > at node3? > > > > > > > > All the resources are added in a single transaction: "cat config.txt > | crm -w > > > > -f- configure" where config.txt contains directives and "commit" > statement > > > > at the end. > > > > > > > > Below are "crm status" (error messages) and "crm configure show" > outputs. > > > > > > > > > > > > root@node3:~# crm status > > > > Current DC: node2 (1017525950) - partition with quorum > > > > 3 Nodes configured > > > > 6 Resources configured > > > > Online: [ node1 node2 node3 ] > > > > Master/Slave Set: ms_drbd [drbd] > > > > Masters: [ node1 ] > > > > Slaves: [ node2 ] > > > > Resource Group: server > > > > fs (ocf::heartbeat:Filesystem): Started node1 > > > > postgresql (lsb:postgresql): Started node3 FAILED > > > > bind9 (lsb:bind9): Started node3 FAILED > > > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED > > > > Failed actions: > > > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete, > > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not > > > > installed > > > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete, > > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): > unknown > > > > error > > > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete, > > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): > unknown > > > > error > > > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, > last-rc-change=Mon > > > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed > > > > > > Here's what is going on. Even when you say "never run this resource on > node3" > > > pacemaker is going to probe for the resource regardless on node3 just > to verify > > > the resource isn't running. > > > > > > The failures you are seeing "monitor_0 failed" indicate that pacemaker > failed > > > to be able to verify resources are running on node3 because the related > > > packages for the resources are not installed. Given pacemaker's default > > > behavior I'd expect this. > > > > > > You have two options. > > > > > > 1. install the resource related packages on node3 even though you > never want > > > them to run there. This will allow the resource-agents to verify the > resource > > > is in fact inactive. > > > > > > 2. If you are using the current master branch of pacemaker, there's a > new > > > location constraint option called > 'resource-discovery=always|never|exclusive'. > > > If you add the 'resource-discovery=never' option to your location > constraint > > > that attempts to keep resources from node3, you'll avoid having > pacemaker > > > perform the 'monitor_0' actions on node3 as well. > > > > > > -- Vossel > > > > > > > > > > > root@node3:~# crm configure show | cat > > > > node $id="1017525950" node2 > > > > node $id="13071578" node3 > > > > node $id="1760315215" node1 > > > > primitive drbd ocf:linbit:drbd \ > > > > params drbd_resource="vlv" \ > > > > op start interval="0" timeout="240" \ > > > > op stop interval="0" timeout="120" > > > > primitive fs ocf:heartbeat:Filesystem \ > > > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root" > > > > options="noatime,nodiratime" fstype="xfs" \ > > > > op start interval="0" timeout="300" \ > > > > op stop interval="0" timeout="300" > > > > primitive postgresql lsb:postgresql \ > > > > op monitor interval="10" timeout="60" \ > > > > op start interval="0" timeout="60" \ > > > > op stop interval="0" timeout="60" > > > > primitive bind9 lsb:bind9 \ > > > > op monitor interval="10" timeout="60" \ > > > > op start interval="0" timeout="60" \ > > > > op stop interval="0" timeout="60" > > > > primitive nginx lsb:nginx \ > > > > op monitor interval="10" timeout="60" \ > > > > op start interval="0" timeout="60" \ > > > > op stop interval="0" timeout="60" > > > > group server fs postgresql bind9 nginx > > > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2" > > > > clone-node-max="1" notify="true" > > > > location loc_server server rule $id="loc_server-rule" -inf: #uname > eq node3 > > > > colocation col_server inf: server ms_drbd:Master > > > > order ord_server inf: ms_drbd:promote server:start > > > > property $id="cib-bootstrap-options" \ > > > > stonith-enabled="false" \ > > > > last-lrm-refresh="1421079189" \ > > > > maintenance-mode="false" > > > > > > > > _______________________________________________ > > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;> > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > > > Project Home: http://www.clusterlabs.org > > > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org