This has been a topic that has popped up occasionally over the years. Unfortunately we still don't have a good answer for you.
The "least worst" practice has been to have the RA return OCF_STOPPED for non-recurring monitor operations (aka. startup probes) IFF its pre-requistites (ie. binaries, or things that might be on a cluster file system) are not available. Possibly we need to begin using the ordering constraints (normally used for ordering start operations) for the startup probes too. Ie. order(A, B) ==> A.start before B.(monitor_0, start) I had been resisting that move, but perhaps its time. (It would also help avoid slamming the cluster with a bazillion operations in parallel when several nodes start up together) Lars? Florian? Comments? On Fri, Oct 12, 2012 at 3:09 AM, Tom Fernandes <anyaddr...@gmx.net> wrote: > Hi all, > > I have a 2-node-cluster running DRBD, libvirtd and a virtual machine. > > I observed that when I stop and start corosync on one of the nodes, pacemaker > (when starting corosync again) wants to check the status of the vm before > starting libvirtd. This check fails as libvirtd needs to be running for this > check. After trying for 20s libvirtd starts. The vm gets restarted after those > 20s and then runs on one of the nodes. I am left with a monitoring-error to > cleanup and my vm has rebooted. > > One solution seems to be to run libvirtd outside the cluster, being managed by > the OS. > > I followed the ha-kvm.pdf guide and other peoples advise with my setup and > wonder if either the guide is wrong / untested or if I'm missing something? > > This was also discussed with some of the folks on #linux-ha a couple of hours > back. > > > warm regards, > > > Tom > > > > node pcmk-1 \ > attributes standby="off" > node pcmk-2 \ > attributes standby="off" > primitive vm1 ocf:heartbeat:VirtualDomain \ > params config="/etc/libvirt/qemu/vm1.xml" \ > meta allow-migrate="false" target-role="Started" \ > op monitor interval="60" timeout="30" \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="120" > primitive drbd_vm1 ocf:linbit:drbd \ > params drbd_resource="vm1" > primitive libvirtd lsb:libvirt-bin > ms ms-drbd_vm1 drbd_vm1 \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node- > max="1" notify="true" target-role="Started" > clone cl-libvirtd libvirtd \ > meta interleave="true" clone-max="2" > colocation vm1_on_drbd inf: vm1 ms-drbd_vm1:Master > order cl-libvirtd_before_vm1 inf: cl-libvirtd:start vm1:start > order drbd_before_vm1 inf: ms-drbd_vm1:promote vm1:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1349962834" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org