[Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?

Andrew Martin Tue, 19 Jun 2012 07:44:41 -0700

Hello,


I have a 3 node Pacemaker+Heartbeat cluster (two real nodes and one "standby" 
quorum node) with Ubuntu 10.04 LTS on the nodes and using the 
Pacemaker+Heartbeat packages from the Ubuntu HA Team PPA ( 
https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa ). I have configured 
3 DRBD resources, a filesystem mount, and a KVM-based virtual machine (using 
the VirtualDomain resource). I have constraints in place so that the DRBD 
devices must become primary and the filesystem must be mounted before the VM 
can start: 

node $id="1ab0690c-5aa0-4d9c-ae4e-b662e0ca54e5" vmhost1 
node $id="219e9bf6-ea99-41f4-895f-4c2c5c78484a" quorumnode \ 
attributes standby="on" 
node $id="645e09b4-aee5-4cec-a241-8bd4e03a78c3" vmhost2 
primitive p_drbd_mount2 ocf:linbit:drbd \ 
params drbd_resource="mount2" \ 
op start interval="0" timeout="240" \ 
op stop interval="0" timeout="100" \ 
op monitor interval="10" role="Master" timeout="30" \ 
op monitor interval="20" role="Slave" timeout="30" 
primitive p_drbd_mount1 ocf:linbit:drbd \ 
params drbd_resource="mount1" \ 
op start interval="0" timeout="240" \ 
op stop interval="0" timeout="100" \ 
op monitor interval="10" role="Master" timeout="30" \ 
op monitor interval="20" role="Slave" timeout="30" 
primitive p_drbd_vmstore ocf:linbit:drbd \ 
params drbd_resource="vmstore" \ 
op start interval="0" timeout="240" \ 
op stop interval="0" timeout="100" \ 
op monitor interval="10" role="Master" timeout="30" \ 
op monitor interval="20" role="Slave" timeout="30" 
primitive p_fs_vmstore ocf:heartbeat:Filesystem \ 
params device="/dev/drbd0" directory="/mnt/storage/vmstore" fstype="ext4" \ 
op start interval="0" timeout="60" \ 
op stop interval="0" timeout="60" \ 
op monitor interval="20" timeout="40" 
primitive p_ping ocf:pacemaker:ping \ 
params name="p_ping" host_list="192.168.1.25 192.168.1.26" multiplier="1000" \ 
op start interval="0" timeout="60" \ 
op monitor interval="20s" timeout="60" 
primitive p_sysadmin_notify ocf:heartbeat:MailTo \ 
params email="al...@example.com" \ 
params subject="Pacemaker Change" \ 
op start interval="0" timeout="10" \ 
op stop interval="0" timeout="10" \ 
op monitor interval="10" timeout="10" 
primitive p_vm_myvm ocf:heartbeat:VirtualDomain \ 
params config="/mnt/storage/vmstore/config/myvm.xml" \ 
meta allow-migrate="false" target-role="Started" is-managed="true" \ 
op start interval="0" timeout="180" \ 
op stop interval="0" timeout="180" \ 
op monitor interval="10" timeout="30" 
primitive stonithquorumnode stonith:external/webpowerswitch \ 
params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx" 
wps_password="xxx" hostname_to_stonith="quorumnode" 
primitive stonithvmhost1 stonith:external/webpowerswitch \ 
params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx" 
wps_password="xxx" hostname_to_stonith="vmhost1" 
primitive stonithvmhost2 stonith:external/webpowerswitch \ 
params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx" 
wps_password="xxx" hostname_to_stonith="vmhost2" 
group g_vm p_fs_vmstore p_vm_myvm 
ms ms_drbd_mount2 p_drbd_mount2 \ 
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" 
notify="true" 
ms ms_drbd_mount1 p_drbd_mount1 \ 
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" 
notify="true" 
ms ms_drbd_vmstore p_drbd_vmstore \ 
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" 
notify="true" 
clone cl_ping p_ping \ 
meta interleave="true" 
clone cl_sysadmin_notify p_sysadmin_notify 
location loc_run_on_most_connected g_vm \ 
rule $id="loc_run_on_most_connected-rule" p_ping: defined p_ping 
location loc_st_nodescan stonithquorumnode -inf: vmhost1 
location loc_st_vmhost1 stonithvmhost1 -inf: vmhost1 
location loc_st_vmhost2 stonithvmhost2 -inf: vmhost2 
colocation c_drbd_libvirt_vm inf: g_vm ms_drbd_vmstore:Master 
ms_drbd_tools:Master ms_drbd_crm:Master 
order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_tools:promote 
ms_drbd_crm:promote g_vm:start 
property $id="cib-bootstrap-options" \ 
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ 
cluster-infrastructure="Heartbeat" \ 
stonith-enabled="true" \ 
no-quorum-policy="freeze" \ 
last-lrm-refresh="1337746179" 


This has been working well, however last week Pacemaker all of a sudden stopped 
the p_vm_myvm resource and then started it up again. I have attached the 
relevant section of /var/log/daemon.log - I am unable to determine what caused 
Pacemaker to restart this resource. Based on the log, could you tell me what 
event triggered this? 


Thanks, 


Andrew

Jun 14 15:25:00 vmhost1 lrmd: [3853]: info: rsc:p_sysadmin_notify:0 monitor[18] (pid 3661)
Jun 14 15:25:00 vmhost1 lrmd: [3853]: info: operation monitor[18] on p_sysadmin_notify:0 for client 3856: pid 3661 exited with return code 0
Jun 14 15:26:42 vmhost1 cib: [3852]: info: cib_stats: Processed 219 operations (182.00us average, 0% utilization) in the last 10min
Jun 14 15:32:43 vmhost1 lrmd: [3853]: info: operation monitor[22] on p_ping:0 for client 3856: pid 10059 exited with return code 0
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 monitor[55] (pid 12323)
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[53] (pid 12324)
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[55] on p_drbd_vmstore:0 for client 3856: pid 12323 exited with return code 8
Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[53] on p_drbd_mount2:0 for client 3856: pid 12324 exited with return code 8
Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[54] (pid 12396)
Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: operation monitor[54] on p_drbd_mount1:0 for client 3856: pid 12396 exited with return code 8
Jun 14 15:36:42 vmhost1 cib: [3852]: info: cib_stats: Processed 220 operations (272.00us average, 0% utilization) in the last 10min
Jun 14 15:37:34 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm monitor[57] (pid 14061)
Jun 14 15:37:34 vmhost1 lrmd: [3853]: info: operation monitor[57] on p_vm_myvm for client 3856: pid 14061 exited with return code 0
Jun 14 15:42:35 vmhost1 attrd: [3855]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_ping (1000)
Jun 14 15:42:35 vmhost1 attrd: [3855]: notice: attrd_perform_update: Sent update 163: p_ping=1000
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: cancel_op: operation monitor[53] on p_drbd_mount2:0 for client 3856, its parameters: CRM_meta_clone=[0] CRM_meta_role=[Master] CRM_meta_notify_slave_resource=[ ] CRM_meta_notify_active_resource=[ ] CRM_meta_timeout=[30000] CRM_meta_notify_demote_uname=[ ] drbd_resource=[crm] CRM_meta_master_node_max=[1] CRM_meta_notify_stop_resource=[ ] CRM_meta_notify_master_resource=[ ] CRM_meta_clone_node_max=[1] CRM_meta_notify_demote_resource=[ ] CRM_meta_clone_max=[2] CRM_meta_notify_start_resource=[p_drbd_mount2:0 ] CRM_meta_notify_ cancelled
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: send_direct_ack: ACK'ing resource op p_drbd_mount2:0_monitor_10000 from 1:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b: lrm_invoke-lrmd-1339706556-53
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: cancel_op: operation monitor[54] on p_drbd_mount1:0 for client 3856, its parameters: CRM_meta_clone=[0] CRM_meta_role=[Master] CRM_meta_notify_slave_resource=[ ] CRM_meta_notify_active_resource=[ ] CRM_meta_timeout=[30000] CRM_meta_notify_demote_uname=[ ] drbd_resource=[tools] CRM_meta_master_node_max=[1] CRM_meta_notify_stop_resource=[ ] CRM_meta_notify_master_resource=[ ] CRM_meta_clone_node_max=[1] CRM_meta_notify_demote_resource=[ ] CRM_meta_clone_max=[2] CRM_meta_notify_start_resource=[p_drbd_mount1:0 ] CRM_meta_n cancelled
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: send_direct_ack: ACK'ing resource op p_drbd_mount1:0_monitor_10000 from 6:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b: lrm_invoke-lrmd-1339706556-54
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: cancel_op: operation monitor[55] on p_drbd_vmstore:0 for client 3856, its parameters: CRM_meta_clone=[0] CRM_meta_role=[Master] CRM_meta_notify_slave_resource=[ ] CRM_meta_notify_active_resource=[ ] CRM_meta_timeout=[30000] CRM_meta_notify_demote_uname=[ ] drbd_resource=[vmstore] CRM_meta_master_node_max=[1] CRM_meta_notify_stop_resource=[ ] CRM_meta_notify_master_resource=[ ] CRM_meta_clone_node_max=[1] CRM_meta_notify_demote_resource=[ ] CRM_meta_clone_max=[2] CRM_meta_notify_start_resource=[p_drbd_vmstore:0 ] CRM_ cancelled
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: send_direct_ack: ACK'ing resource op p_drbd_vmstore:0_monitor_10000 from 2:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b: lrm_invoke-lrmd-1339706556-55
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: cancel_op: operation monitor[57] on p_vm_myvm for client 3856, its parameters: CRM_meta_name=[monitor] crm_feature_set=[3.0.5] config=[/mnt/storage/vmstore/config/myvm.xml] CRM_meta_timeout=[30000] CRM_meta_interval=[10000]  cancelled
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=136:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_vm_myvm_stop_0 )
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm stop[58] (pid 18174)
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=168:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_drbd_mount2:0_notify_0 )
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 notify[59] (pid 18175)
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=184:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_drbd_mount1:0_notify_0 )
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 notify[60] (pid 18177)
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=200:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_drbd_vmstore:0_notify_0 )
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 notify[61] (pid 18179)
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_mount2:0_monitor_10000 (call=53, status=1, cib-update=0, confirmed=true) Cancelled
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_mount1:0_monitor_10000 (call=54, status=1, cib-update=0, confirmed=true) Cancelled
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_vmstore:0_monitor_10000 (call=55, status=1, cib-update=0, confirmed=true) Cancelled
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_vm_myvm_monitor_10000 (call=57, status=1, cib-update=0, confirmed=true) Cancelled
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: operation notify[59] on p_drbd_mount2:0 for client 3856: pid 18175 exited with return code 0
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: operation notify[60] on p_drbd_mount1:0 for client 3856: pid 18177 exited with return code 0
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: send_direct_ack: ACK'ing resource op p_drbd_mount2:0_notify_0 from 168:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b: lrm_invoke-lrmd-1339706556-56
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_mount2:0_notify_0 (call=59, rc=0, cib-update=0, confirmed=true) ok
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: send_direct_ack: ACK'ing resource op p_drbd_mount1:0_notify_0 from 184:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b: lrm_invoke-lrmd-1339706556-57
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_mount1:0_notify_0 (call=60, rc=0, cib-update=0, confirmed=true) ok
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: operation notify[61] on p_drbd_vmstore:0 for client 3856: pid 18179 exited with return code 0
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: send_direct_ack: ACK'ing resource op p_drbd_vmstore:0_notify_0 from 200:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b: lrm_invoke-lrmd-1339706556-58
Jun 14 15:42:36 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_vmstore:0_notify_0 (call=61, rc=0, cib-update=0, confirmed=true) ok
Jun 14 15:42:36 vmhost1 VirtualDomain[18174]: [18244]: INFO: Issuing graceful shutdown request for domain myvm.
Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: RA output: (p_vm_myvm:stop:stdout) Domain myvm is being shutdown
Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_ping (2000)
Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_perform_update: Sent update 165: p_ping=2000
Jun 14 15:43:58 vmhost1 lrmd: [3853]: info: RA output: (p_vm_myvm:stop:stderr) error: operation failed: could not query memory balloon allocation
Jun 14 15:43:58 vmhost1 VirtualDomain[18174]: [18899]: INFO: Virtual domain myvm currently has no state, retrying.
Jun 14 15:43:59 vmhost1 lrmd: [3853]: info: operation stop[58] on p_vm_myvm for client 3856: pid 18174 exited with return code 0
Jun 14 15:43:59 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_vm_myvm_stop_0 (call=58, rc=0, cib-update=83, confirmed=true) ok
Jun 14 15:44:01 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=31:2352:8:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_drbd_mount2:0_monitor_10000 )
Jun 14 15:44:01 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[62] (pid 18935)
Jun 14 15:44:01 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=64:2352:8:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_drbd_mount1:0_monitor_10000 )
Jun 14 15:44:01 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[63] (pid 18936)
Jun 14 15:44:01 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=97:2352:8:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_drbd_vmstore:0_monitor_10000 )
Jun 14 15:44:01 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 monitor[64] (pid 18937)
Jun 14 15:44:01 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=128:2352:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_vm_myvm_start_0 )
Jun 14 15:44:01 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm start[65] (pid 18939)
Jun 14 15:44:01 vmhost1 lrmd: [3853]: info: operation monitor[62] on p_drbd_mount2:0 for client 3856: pid 18935 exited with return code 8
Jun 14 15:44:01 vmhost1 lrmd: [3853]: info: operation monitor[63] on p_drbd_mount1:0 for client 3856: pid 18936 exited with return code 8
Jun 14 15:44:01 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_mount2:0_monitor_10000 (call=62, rc=8, cib-update=84, confirmed=false) master
Jun 14 15:44:01 vmhost1 lrmd: [3853]: info: operation monitor[64] on p_drbd_vmstore:0 for client 3856: pid 18937 exited with return code 8
Jun 14 15:44:01 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_mount1:0_monitor_10000 (call=63, rc=8, cib-update=85, confirmed=false) master
Jun 14 15:44:01 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_drbd_vmstore:0_monitor_10000 (call=64, rc=8, cib-update=86, confirmed=false) master
Jun 14 15:44:01 vmhost1 VirtualDomain[18939]: [19033]: INFO: Domain name "myvm" saved to /var/run/resource-agents/VirtualDomain-p_vm_myvm.state.
Jun 14 15:44:02 vmhost1 lrmd: [3853]: info: RA output: (p_vm_myvm:start:stdout) Domain myvm started
Jun 14 15:44:02 vmhost1 lrmd: [3853]: info: operation start[65] on p_vm_myvm for client 3856: pid 18939 exited with return code 0
Jun 14 15:44:02 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_vm_myvm_start_0 (call=65, rc=0, cib-update=87, confirmed=true) ok
Jun 14 15:44:03 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=129:2352:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_vm_myvm_monitor_10000 )
Jun 14 15:44:03 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm monitor[66] (pid 19121)
Jun 14 15:44:03 vmhost1 lrmd: [3853]: info: operation monitor[66] on p_vm_myvm for client 3856: pid 19121 exited with return code 0
Jun 14 15:44:03 vmhost1 crmd: [3856]: info: process_lrm_event: LRM operation p_vm_myvm_monitor_10000 (call=66, rc=0, cib-update=88, confirmed=false) ok

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?

Reply via email to