I looked at /usr/lib/ocf/resource.d/heartbeat/VirtualDomain and also checked the permissions of the state file and it was the same as the working one. It was only empty on one node.
I did an echo Convirt > /var/run/heartbeat/rsctmp/VirtualDomain-convirt-kvm.state on the empty node and restarted pacemaker and the resource came up. I restarted it a few times without manually editing the state file and it still started up. The issue was very strange indeed. Thanks for the speedy help. On Nov 1, 2011, at 12:04 AM, Tim Serong wrote: > On 11/01/2011 02:23 PM, Cliff Massey wrote: >> >> I am having a problem with my kvm resource. It was working until I decided >> to re-install the kvm machine. The libvirt xml file and the pacemaker >> configuration did not change. I can start the kvm outside of pacemaker just >> fine. When I check the libvirt log, it shows no attempt to start the kvm >> machine from pacemaker. >> >> crm_mon -1 shows: >> >> Online: [ admin01 admin02 ] >> >> convirt-kvm (ocf::heartbeat:VirtualDomain): Started admin01 (unmanaged) >> FAILED >> Master/Slave Set: ms-convirt [convirt-drbd] >> Masters: [ admin02 ] >> Slaves: [ admin01 ] >> sitescope-kvm (ocf::heartbeat:VirtualDomain): Started admin02After >> digging >> Master/Slave Set: ms-sitescope [sitescope-drbd] >> Masters: [ admin02 ] >> Slaves: [ admin01 ] >> >> Failed actions: >> convirt-kvm_monitor_0 (node=admin01, call=2, rc=1, status=complete): >> unknown error >> convirt-kvm_stop_0 (node=admin01, call=6, rc=1, status=complete): >> unknown error >> >> My other kvm machine with the same config works just fine. > > I can't tell you why it doesn't work anymore, but... > >> >> my logs are at: http://pastebin.com/peFw5KKp > > The relevant bit of that log is (pardon the formatting): > > Nov 1 03:14:37 admin01 crmd: [15349]: info: te_rsc_command: Initiating > action 4: monitor convirt-kvm_monitor_0 on admin01 (local) > ... > Nov 1 03:14:38 admin01 VirtualDomain[15370]: ERROR: > /var/run/heartbeat/rsctmp/VirtualDomain-convirt-kvm.state is empty. This is > unexpected. Cannot determine domain name. > ... > Nov 1 03:14:38 admin01 lrmd: [15346]: WARN: Managed convirt-kvm:monitor > process 15370 exited with return code 1. > ... > Nov 1 03:14:38 admin01 crmd: [15349]: info: process_lrm_event: LRM operation > convirt-kvm_monitor_0 (call=2, rc=1, cib-update=29, confirmed=true) unknown > error > > So the probe (and presumably subsequent stop) for that resource failed, hence > no attempt to start it. As for how the state file is empty, I'm not sure. > Look at VirtualDomain_Define() in > /usr/lib/ocf/resource.d/heartbeat/VirtualDomain (line ~200 onwards), by my > reading it shouldn't be possible for that state file to be empty. Unless, > somehow (wild guess), permissions on the state file or some parent directory > prohibit writing? > > Regards, > > Tim > -- > Tim Serong > Senior Clustering Engineer > SUSE > tser...@suse.com > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker