> On 13 Jan 2015, at 4:25 am, David Vossel <dvos...@redhat.com> wrote: > > > > ----- Original Message ----- >> Hello. >> >> I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are >> DRBD master-slave, also they have a number of other services installed >> (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no >> DRBD/postgresql/... are installed at it, only corosync+pacemaker. >> >> But when I add resources to the cluster, a part of them are somehow moved to >> node3 and since then fail. Note than I have a "colocation" directive to >> place these resources to the DRBD master only and "location" with -inf for >> node3, but this does not help - why? How to make pacemaker not run anything >> at node3? >> >> All the resources are added in a single transaction: "cat config.txt | crm -w >> -f- configure" where config.txt contains directives and "commit" statement >> at the end. >> >> Below are "crm status" (error messages) and "crm configure show" outputs. >> >> >> root@node3:~# crm status >> Current DC: node2 (1017525950) - partition with quorum >> 3 Nodes configured >> 6 Resources configured >> Online: [ node1 node2 node3 ] >> Master/Slave Set: ms_drbd [drbd] >> Masters: [ node1 ] >> Slaves: [ node2 ] >> Resource Group: server >> fs (ocf::heartbeat:Filesystem): Started node1 >> postgresql (lsb:postgresql): Started node3 FAILED >> bind9 (lsb:bind9): Started node3 FAILED >> nginx (lsb:nginx): Started node3 (unmanaged) FAILED >> Failed actions: >> drbd_monitor_0 (node=node3, call=744, rc=5, status=complete, >> last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not >> installed >> postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete, >> last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown >> error >> bind9_monitor_0 (node=node3, call=757, rc=1, status=complete, >> last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown >> error >> nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon >> Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed > > Here's what is going on. Even when you say "never run this resource on node3" > pacemaker is going to probe for the resource regardless on node3 just to > verify > the resource isn't running. > > The failures you are seeing "monitor_0 failed" indicate that pacemaker failed > to be able to verify resources are running on node3 because the related > packages for the resources are not installed. Given pacemaker's default > behavior I'd expect this. > > You have two options. > > 1. install the resource related packages on node3 even though you never want > them to run there. This will allow the resource-agents to verify the resource > is in fact inactive.
or 1b. delete the agent too. recent versions of pacemaker should handle this case correctly. > > 2. If you are using the current master branch of pacemaker, there's a new > location constraint option called 'resource-discovery=always|never|exclusive'. > If you add the 'resource-discovery=never' option to your location constraint > that attempts to keep resources from node3, you'll avoid having pacemaker > perform the 'monitor_0' actions on node3 as well. > > -- Vossel > >> >> root@node3:~# crm configure show | cat >> node $id="1017525950" node2 >> node $id="13071578" node3 >> node $id="1760315215" node1 >> primitive drbd ocf:linbit:drbd \ >> params drbd_resource="vlv" \ >> op start interval="0" timeout="240" \ >> op stop interval="0" timeout="120" >> primitive fs ocf:heartbeat:Filesystem \ >> params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root" >> options="noatime,nodiratime" fstype="xfs" \ >> op start interval="0" timeout="300" \ >> op stop interval="0" timeout="300" >> primitive postgresql lsb:postgresql \ >> op monitor interval="10" timeout="60" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="60" >> primitive bind9 lsb:bind9 \ >> op monitor interval="10" timeout="60" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="60" >> primitive nginx lsb:nginx \ >> op monitor interval="10" timeout="60" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="60" >> group server fs postgresql bind9 nginx >> ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2" >> clone-node-max="1" notify="true" >> location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3 >> colocation col_server inf: server ms_drbd:Master >> order ord_server inf: ms_drbd:promote server:start >> property $id="cib-bootstrap-options" \ >> stonith-enabled="false" \ >> last-lrm-refresh="1421079189" \ >> maintenance-mode="false" >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org