----- Original Message ----- > Hello. > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are > DRBD master-slave, also they have a number of other services installed > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no > DRBD/postgresql/... are installed at it, only corosync+pacemaker. > > But when I add resources to the cluster, a part of them are somehow moved to > node3 and since then fail. Note than I have a "colocation" directive to > place these resources to the DRBD master only and "location" with -inf for > node3, but this does not help - why? How to make pacemaker not run anything > at node3? > > All the resources are added in a single transaction: "cat config.txt | crm -w > -f- configure" where config.txt contains directives and "commit" statement > at the end. > > Below are "crm status" (error messages) and "crm configure show" outputs. > > > root@node3:~# crm status > Current DC: node2 (1017525950) - partition with quorum > 3 Nodes configured > 6 Resources configured > Online: [ node1 node2 node3 ] > Master/Slave Set: ms_drbd [drbd] > Masters: [ node1 ] > Slaves: [ node2 ] > Resource Group: server > fs (ocf::heartbeat:Filesystem): Started node1 > postgresql (lsb:postgresql): Started node3 FAILED > bind9 (lsb:bind9): Started node3 FAILED > nginx (lsb:nginx): Started node3 (unmanaged) FAILED > Failed actions: > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete, > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not > installed > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete, > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown > error > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete, > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown > error > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
Here's what is going on. Even when you say "never run this resource on node3" pacemaker is going to probe for the resource regardless on node3 just to verify the resource isn't running. The failures you are seeing "monitor_0 failed" indicate that pacemaker failed to be able to verify resources are running on node3 because the related packages for the resources are not installed. Given pacemaker's default behavior I'd expect this. You have two options. 1. install the resource related packages on node3 even though you never want them to run there. This will allow the resource-agents to verify the resource is in fact inactive. 2. If you are using the current master branch of pacemaker, there's a new location constraint option called 'resource-discovery=always|never|exclusive'. If you add the 'resource-discovery=never' option to your location constraint that attempts to keep resources from node3, you'll avoid having pacemaker perform the 'monitor_0' actions on node3 as well. -- Vossel > > root@node3:~# crm configure show | cat > node $id="1017525950" node2 > node $id="13071578" node3 > node $id="1760315215" node1 > primitive drbd ocf:linbit:drbd \ > params drbd_resource="vlv" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="120" > primitive fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root" > options="noatime,nodiratime" fstype="xfs" \ > op start interval="0" timeout="300" \ > op stop interval="0" timeout="300" > primitive postgresql lsb:postgresql \ > op monitor interval="10" timeout="60" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" > primitive bind9 lsb:bind9 \ > op monitor interval="10" timeout="60" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" > primitive nginx lsb:nginx \ > op monitor interval="10" timeout="60" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" > group server fs postgresql bind9 nginx > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3 > colocation col_server inf: server ms_drbd:Master > order ord_server inf: ms_drbd:promote server:start > property $id="cib-bootstrap-options" \ > stonith-enabled="false" \ > last-lrm-refresh="1421079189" \ > maintenance-mode="false" > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org