> > 1. install the resource related packages on node3 even though you never > want > them to run there. This will allow the resource-agents to verify the > resource > is in fact inactive.
Thanks, your advise helped: I installed all the services at node3 as well (including DRBD, but without it configs) and stopped+disabled them. Then I added the following line to my configuration: location loc_drbd drbd rule -inf: #uname eq node3 So node3 is never a target for DRBD, and this helped: "crm nodr standby node1" doesn't tries to use node3 anymore. But I have another (related) issue. If some node (e.g. node1) becomes isolated from other 2 nodes, how to force it to shutdown its services? I cannot use IPMB-based fencing/stonith, because there are no reliable connections between nodes at all (the nodes are in geo-distributed datacenters), and IPMI call to shutdown a node from another node is impossible. E.g. initially I have the following: *# crm status* Online: [ node1 node2 node3 ] Master/Slave Set: ms_drbd [drbd] Masters: [ node2 ] Slaves: [ node1 ] Resource Group: server fs (ocf::heartbeat:Filesystem): Started node2 postgresql (lsb:postgresql): Started node2 bind9 (lsb:bind9): Started node2 nginx (lsb:nginx): Started node2 Then I turn on firewall on node2 to isolate it from the outside internet: *root@node2:~# iptables -A INPUT -p tcp --dport 22 -j ACCEPT* *root@node2:~# **iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT* *root@node2:~# **iptables -A INPUT -i lo -j ACCEPT* *root@node2:~# **iptables -A OUTPUT -o lo -j ACCEPT* *root@node2:~# **iptables -P INPUT DROP; iptables -P OUTPUT DROP* Then I see that, although node2 clearly knows it's isolated (it doesn't see other 2 nodes and does not have quorum), it does not stop its services: *root@node2:~# crm status* Online: [ node2 ] OFFLINE: [ node1 node3 ] Master/Slave Set: ms_drbd [drbd] Masters: [ node2 ] Stopped: [ node1 node3 ] Resource Group: server fs (ocf::heartbeat:Filesystem): Started node2 postgresql (lsb:postgresql): Started node2 bind9 (lsb:bind9): Started node2 nginx (lsb:nginx): Started node2 So is there a way to say pacemaker to shutdown nodes' services when they become isolated? On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvos...@redhat.com> wrote: > > > ----- Original Message ----- > > Hello. > > > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 > are > > DRBD master-slave, also they have a number of other services installed > > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no > > DRBD/postgresql/... are installed at it, only corosync+pacemaker. > > > > But when I add resources to the cluster, a part of them are somehow > moved to > > node3 and since then fail. Note than I have a "colocation" directive to > > place these resources to the DRBD master only and "location" with -inf > for > > node3, but this does not help - why? How to make pacemaker not run > anything > > at node3? > > > > All the resources are added in a single transaction: "cat config.txt | > crm -w > > -f- configure" where config.txt contains directives and "commit" > statement > > at the end. > > > > Below are "crm status" (error messages) and "crm configure show" outputs. > > > > > > root@node3:~# crm status > > Current DC: node2 (1017525950) - partition with quorum > > 3 Nodes configured > > 6 Resources configured > > Online: [ node1 node2 node3 ] > > Master/Slave Set: ms_drbd [drbd] > > Masters: [ node1 ] > > Slaves: [ node2 ] > > Resource Group: server > > fs (ocf::heartbeat:Filesystem): Started node1 > > postgresql (lsb:postgresql): Started node3 FAILED > > bind9 (lsb:bind9): Started node3 FAILED > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED > > Failed actions: > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete, > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not > > installed > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete, > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown > > error > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete, > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown > > error > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, > last-rc-change=Mon > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed > > Here's what is going on. Even when you say "never run this resource on > node3" > pacemaker is going to probe for the resource regardless on node3 just to > verify > the resource isn't running. > > The failures you are seeing "monitor_0 failed" indicate that pacemaker > failed > to be able to verify resources are running on node3 because the related > packages for the resources are not installed. Given pacemaker's default > behavior I'd expect this. > > You have two options. > > 1. install the resource related packages on node3 even though you never > want > them to run there. This will allow the resource-agents to verify the > resource > is in fact inactive. > > 2. If you are using the current master branch of pacemaker, there's a new > location constraint option called > 'resource-discovery=always|never|exclusive'. > If you add the 'resource-discovery=never' option to your location > constraint > that attempts to keep resources from node3, you'll avoid having pacemaker > perform the 'monitor_0' actions on node3 as well. > > -- Vossel > > > > > root@node3:~# crm configure show | cat > > node $id="1017525950" node2 > > node $id="13071578" node3 > > node $id="1760315215" node1 > > primitive drbd ocf:linbit:drbd \ > > params drbd_resource="vlv" \ > > op start interval="0" timeout="240" \ > > op stop interval="0" timeout="120" > > primitive fs ocf:heartbeat:Filesystem \ > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root" > > options="noatime,nodiratime" fstype="xfs" \ > > op start interval="0" timeout="300" \ > > op stop interval="0" timeout="300" > > primitive postgresql lsb:postgresql \ > > op monitor interval="10" timeout="60" \ > > op start interval="0" timeout="60" \ > > op stop interval="0" timeout="60" > > primitive bind9 lsb:bind9 \ > > op monitor interval="10" timeout="60" \ > > op start interval="0" timeout="60" \ > > op stop interval="0" timeout="60" > > primitive nginx lsb:nginx \ > > op monitor interval="10" timeout="60" \ > > op start interval="0" timeout="60" \ > > op stop interval="0" timeout="60" > > group server fs postgresql bind9 nginx > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2" > > clone-node-max="1" notify="true" > > location loc_server server rule $id="loc_server-rule" -inf: #uname eq > node3 > > colocation col_server inf: server ms_drbd:Master > > order ord_server inf: ms_drbd:promote server:start > > property $id="cib-bootstrap-options" \ > > stonith-enabled="false" \ > > last-lrm-refresh="1421079189" \ > > maintenance-mode="false" > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org