> On 14 Jan 2015, at 12:06 am, Dmitry Koterov <dmitry.kote...@gmail.com> wrote: > > > > Then I see that, although node2 clearly knows it's isolated (it doesn't see > > other 2 nodes and does not have quorum) > > we don't know that - there are several algorithms for calculating quorum and > the information isn't included in your output. > are you using cman, or corosync underneath pacemaker? corosync version? > pacemaker version? have you set no-quorum-policy? > > no-quorum-policy is not set, so, according to > http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html > , it is "stop - stop all resources in the affected cluster parition". I > suppose this is the right option, but why the resources are not stopped on > the node when this one node of three becomes isolated and the node clearly > sees other nodes as offline (so it knows it's isolated)? What should I > configure in addition? > > I'm using corosync+pacemaker, no cman. Below (in quotes) is output of "crm > configure show". Versions are from Ubuntu 14.04, so almost new.
I don't have Ubuntu installed. You'll have to be more specific as to what package versions you have. > > > > , it does not stop its services: > > > > root@node2:~# crm status > > Online: [ node2 ] > > OFFLINE: [ node1 node3 ] > > Master/Slave Set: ms_drbd [drbd] > > Masters: [ node2 ] > > Stopped: [ node1 node3 ] > > Resource Group: server > > fs (ocf::heartbeat:Filesystem): Started node2 > > postgresql (lsb:postgresql): Started node2 > > bind9 (lsb:bind9): Started node2 > > nginx (lsb:nginx): Started node2 > > > > So is there a way to say pacemaker to shutdown nodes' services when they > > become isolated? > > > > > > > > On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvos...@redhat.com> wrote: > > > > > > ----- Original Message ----- > > > Hello. > > > > > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 > > > are > > > DRBD master-slave, also they have a number of other services installed > > > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no > > > DRBD/postgresql/... are installed at it, only corosync+pacemaker. > > > > > > But when I add resources to the cluster, a part of them are somehow moved > > > to > > > node3 and since then fail. Note than I have a "colocation" directive to > > > place these resources to the DRBD master only and "location" with -inf for > > > node3, but this does not help - why? How to make pacemaker not run > > > anything > > > at node3? > > > > > > All the resources are added in a single transaction: "cat config.txt | > > > crm -w > > > -f- configure" where config.txt contains directives and "commit" statement > > > at the end. > > > > > > Below are "crm status" (error messages) and "crm configure show" outputs. > > > > > > > > > root@node3:~# crm status > > > Current DC: node2 (1017525950) - partition with quorum > > > 3 Nodes configured > > > 6 Resources configured > > > Online: [ node1 node2 node3 ] > > > Master/Slave Set: ms_drbd [drbd] > > > Masters: [ node1 ] > > > Slaves: [ node2 ] > > > Resource Group: server > > > fs (ocf::heartbeat:Filesystem): Started node1 > > > postgresql (lsb:postgresql): Started node3 FAILED > > > bind9 (lsb:bind9): Started node3 FAILED > > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED > > > Failed actions: > > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete, > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not > > > installed > > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete, > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown > > > error > > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete, > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown > > > error > > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, > > > last-rc-change=Mon > > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed > > > > Here's what is going on. Even when you say "never run this resource on > > node3" > > pacemaker is going to probe for the resource regardless on node3 just to > > verify > > the resource isn't running. > > > > The failures you are seeing "monitor_0 failed" indicate that pacemaker > > failed > > to be able to verify resources are running on node3 because the related > > packages for the resources are not installed. Given pacemaker's default > > behavior I'd expect this. > > > > You have two options. > > > > 1. install the resource related packages on node3 even though you never want > > them to run there. This will allow the resource-agents to verify the > > resource > > is in fact inactive. > > > > 2. If you are using the current master branch of pacemaker, there's a new > > location constraint option called > > 'resource-discovery=always|never|exclusive'. > > If you add the 'resource-discovery=never' option to your location constraint > > that attempts to keep resources from node3, you'll avoid having pacemaker > > perform the 'monitor_0' actions on node3 as well. > > > > -- Vossel > > > > > > > > root@node3:~# crm configure show | cat > > > node $id="1017525950" node2 > > > node $id="13071578" node3 > > > node $id="1760315215" node1 > > > primitive drbd ocf:linbit:drbd \ > > > params drbd_resource="vlv" \ > > > op start interval="0" timeout="240" \ > > > op stop interval="0" timeout="120" > > > primitive fs ocf:heartbeat:Filesystem \ > > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root" > > > options="noatime,nodiratime" fstype="xfs" \ > > > op start interval="0" timeout="300" \ > > > op stop interval="0" timeout="300" > > > primitive postgresql lsb:postgresql \ > > > op monitor interval="10" timeout="60" \ > > > op start interval="0" timeout="60" \ > > > op stop interval="0" timeout="60" > > > primitive bind9 lsb:bind9 \ > > > op monitor interval="10" timeout="60" \ > > > op start interval="0" timeout="60" \ > > > op stop interval="0" timeout="60" > > > primitive nginx lsb:nginx \ > > > op monitor interval="10" timeout="60" \ > > > op start interval="0" timeout="60" \ > > > op stop interval="0" timeout="60" > > > group server fs postgresql bind9 nginx > > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2" > > > clone-node-max="1" notify="true" > > > location loc_server server rule $id="loc_server-rule" -inf: #uname eq > > > node3 > > > colocation col_server inf: server ms_drbd:Master > > > order ord_server inf: ms_drbd:promote server:start > > > property $id="cib-bootstrap-options" \ > > > stonith-enabled="false" \ > > > last-lrm-refresh="1421079189" \ > > > maintenance-mode="false" > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org