Hi, On Sun, Mar 13, 2011 at 11:15:25PM +0300, Pavel Levshin wrote: > Hi. > > You have hit this: > > Mar 3 16:49:16 breadnut2 VirtualDomain[20709]: INFO: Virtual domain vg.test1 > currently has no state, retrying. > Mar 3 16:49:16 breadnut2 lrmd: [20694]: WARN: p-vd_vg.test1:monitor process > (PID 20709) timed out (try 1). Killing with signal SIGTERM (15). > Mar 3 16:49:16 breadnut2 lrmd: [20694]: WARN: operation monitor[5] on > ocf::VirtualDomain::p-vd_vg.test1 for client 20697, its parameters: > crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/vg.test1.xml] > CRM_meta_timeout=[20000] migration_transport=[tcp] : pid [20709] timed out > Mar 3 16:49:16 breadnut2 crmd: [20697]: ERROR: process_lrm_event: LRM > operation p-vd_vg.test1_monitor_0 (5) Timed Out (timeout=20000ms) > > > When a cluster node comes up, it is directed to probe each clustered > resource on the node. This behaviour does not depend on constraints, > this check is mandatory. > > At the moment, libvirtd is not running yet. Thus, VirtualDomain RA is > unable to connect to it and to check if your VM is running. So it times > out after some time. > > Timeout of monitor action implies "unknown error" of the resource. > Pengine cannot ensure that your resource is not running, so it believes > it is, and stops the resource everywhere, then starts it again to > recover. > > This is what you get. How to work around is a different story. Frankly, > I don't see a decent way. > > VirtualDomain RA really cannot tell if VM is running while it cannot > connect to libvirtd. I'm not too sure, but your log suggests that > libvirtd will not be started until VirtualDomain monitor returns. > > I'd suggest you to start libvirtd before corosync, from initscripts, and > see if it helps.
Right. > May anyone propose a cleaner solution? No. The RA clearly states that libvirtd is required. The corosync/heartbeat init scripts should have it as Should-Start. Thanks, Dejan > > -- > Pavel Levshin > > > 03.03.2011 9:05, AP пишет: >> Hi, >> >> Having deep issues with my cluster setup. Everything works ok until >> I add a VirtualDomain RA in. Then things go pearshaped in that it seems >> to ignore the "order" crm config for it and starts as soon as it can. >> >> The crm config is provided below. Basically p-vd_vg.test1 attempts to >> start despite p-libvirtd not being started and p-drbd_vg.test1 not >> being master (or slave for that matter - ie it's not configured at all). >> >> Eventually p-libvirtd and p-drbd_vg.test1 start and p-vd_vg.test1 attempts >> to, pengine on the node where p-vd_vg.test1 is already running complains >> with: >> >> Mar 3 16:49:16 breadnut pengine: [2097]: ERROR: native_create_actions: >> Resource p-vd_vg.test1 (ocf::VirtualDomain) is active on 2 nodes attempting >> recovery >> Mar 3 16:49:16 breadnut pengine: [2097]: WARN: See >> http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information. >> >> Then mass slaughter occurs and p-vd_vg.test1 is restarted where it was >> running previously whilst the other node gets an error for it. >> >> Essentially I cannot restart the 2nd node without it breaking the 1st. >> >> Now, as I understand it, a lone primitive will run once on any node - this >> is just fine by me. >> >> colo-vd_vg.test1 indicates that p-vd_vg.test1 should run where >> ms-drbd_vg.test1 >> is master. ms-drbd_vg.test1 should only be master where clone-libvirtd is >> started. >> >> order-vg.test1 indicates that ms-drbd_vg.test1 should start after >> clone-lvm_gh >> is started (successfully). (This used to have a promote for ms-drbd_vg.test1 >> but then ms-drbd_vg.test1 would be demoted and not stopped on shutdown which >> would cause clone-lvm_gh to error out on stop) >> >> order-vd_vg.test1 indicates p-vd_vg.test1 should only start where >> ms-drbd_vg.test1 and clone-libvirtd have both successfully started (the >> order of their starting being irrelevant). >> >> cli-standby-p-vd_vg.test1 was put there by my migrating p-vd_vg.test1 >> about the place. >> >> This happens with or without fencing and with fencing configured as below >> or as just a single primited with both nodes in the hostlist. >> >> Help with this would be awesome and appreciated. I do not know what I am >> missing here. The config makes sense to me so I don't even know where >> to start poking and prodding. I be flailing. >> >> Config and s/w version list is below: >> >> OS: Debian Squeeze >> Kernel: 2.6.37.2 >> >> PACKAGES: >> >> ii cluster-agents 1:1.0.4-0ubuntu1~custom1 The >> reusable cluster components for Linux HA >> ii cluster-glue 1.0.7-3ubuntu1~custom1 The >> reusable cluster components for Linux HA >> ii corosync 1.3.0-1ubuntu1~custom1 >> Standards-based cluster framework (daemon and modules) >> ii libccs3 3.1.0-0ubuntu1~custom1 Red Hat >> cluster suite - cluster configuration libraries >> ii libcib1 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - CIB >> ii libcman3 3.1.0-0ubuntu1~custom1 Red Hat >> cluster suite - cluster manager libraries >> ii libcorosync4 1.3.0-1ubuntu1~custom1 >> Standards-based cluster framework (libraries) >> ii libcrmcluster1 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - CRM >> ii libcrmcommon2 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - common CRM >> ii libfence4 3.1.0-0ubuntu1~custom1 Red Hat >> cluster suite - fence client library >> ii liblrm2 1.0.7-3ubuntu1~custom1 >> Reusable cluster libraries -- liblrm2 >> ii libpe-rules2 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - rules for P-Engine >> ii libpe-status3 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - status for P-Engine >> ii libpengine3 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - P-Engine >> ii libpils2 1.0.7-3ubuntu1~custom1 >> Reusable cluster libraries -- libpils2 >> ii libplumb2 1.0.7-3ubuntu1~custom1 >> Reusable cluster libraries -- libplumb2 >> ii libplumbgpl2 1.0.7-3ubuntu1~custom1 >> Reusable cluster libraries -- libplumbgpl2 >> ii libstonith1 1.0.7-3ubuntu1~custom1 >> Reusable cluster libraries -- libstonith1 >> ii libstonithd1 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - stonith >> ii libtransitioner1 1.1.5-0ubuntu1~ppa1~custom1 The >> Pacemaker libraries - transitioner >> ii pacemaker 1.1.5-0ubuntu1~ppa1~custom1 HA >> cluster resource manager >> >> CONFIG: >> >> node breadnut >> node breadnut2 \ >> attributes standby="off" >> primitive fencing-bn stonith:meatware \ >> params hostlist="breadnut" \ >> op start interval="0" timeout="60s" \ >> op stop interval="0" timeout="70s" \ >> op monitor interval="10" timeout="60s" >> primitive fencing-bn2 stonith:meatware \ >> params hostlist="breadnut2" \ >> op start interval="0" timeout="60s" \ >> op stop interval="0" timeout="70s" \ >> op monitor interval="10" timeout="60s" >> primitive p-drbd_vg.test1 ocf:linbit:drbd \ >> params drbd_resource="vg.test1" \ >> operations $id="ops-drbd_vg.test1" \ >> op start interval="0" timeout="240s" \ >> op stop interval="0" timeout="100s" \ >> op monitor interval="20" role="Master" timeout="20s" \ >> op monitor interval="30" role="Slave" timeout="20s" >> primitive p-libvirtd ocf:local:libvirtd \ >> meta allow-migrate="off" \ >> op start interval="0" timeout="200s" \ >> op stop interval="0" timeout="100s" \ >> op monitor interval="10" timeout="200s" >> primitive p-lvm_gh ocf:heartbeat:LVM \ >> params volgrpname="gh" \ >> meta allow-migrate="off" \ >> op start interval="0" timeout="90s" \ >> op stop interval="0" timeout="100s" \ >> op monitor interval="10" timeout="100s" >> primitive p-vd_vg.test1 ocf:heartbeat:VirtualDomain \ >> params config="/etc/libvirt/qemu/vg.test1.xml" \ >> params migration_transport="tcp" \ >> meta allow-migrate="true" is-managed="true" \ >> op start interval="0" timeout="120s" \ >> op stop interval="0" timeout="120s" \ >> op migrate_to interval="0" timeout="120s" \ >> op migrate_from interval="0" timeout="120s" \ >> op monitor interval="10s" timeout="120s" >> ms ms-drbd_vg.test1 p-drbd_vg.test1 \ >> meta resource-stickines="100" notify="true" master-max="2" >> target-role="Master" >> clone clone-libvirtd p-libvirtd \ >> meta interleave="true" >> clone clone-lvm_gh p-lvm_gh \ >> meta interleave="true" >> location cli-standby-p-vd_vg.test1 p-vd_vg.test1 \ >> rule $id="cli-standby-rule-p-vd_vg.test1" -inf: #uname eq breadnut2 >> location loc-fencing-bn fencing-bn -inf: breadnut >> location loc-fencing-bn2 fencing-bn2 -inf: breadnut2 >> colocation colo-vd_vg.test1 inf: p-vd_vg.test1:Started >> ms-drbd_vg.test1:Master clone-libvirtd:Started >> order order-vd_vg.test1 inf: ( ms-drbd_vg.test1:start clone-libvirtd:start ) >> p-vd_vg.test1:start >> order order-vg.test1 inf: clone-lvm_gh:start ms-drbd_vg.test1:start >> property $id="cib-bootstrap-options" \ >> dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ >> cluster-infrastructure="openais" \ >> default-resource-stickiness="1000" \ >> stonith-enabled="true" \ >> expected-quorum-votes="2" \ >> no-quorum-policy="ignore" \ >> last-lrm-refresh="1299128317" >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker