Hello,
I have configured a KVM virtual machine primitive using Pacemaker 1.1.6 and Heartbeat 3.0.5 on Ubuntu 10.04 Server using DRBD as the storage device (so there is no shared storage, no live-migration): primitive p_vm ocf:heartbeat:VirtualDomain \ params config="/vmstore/config/vm.xml" \ meta allow-migrate="false" \ op start interval="0" timeout="180s" \ op stop interval="0" timeout="120s" \ op monitor interval="10" timeout="30" I would expect the following events to happen on failover on the "from" node (the migration source) if the VM hangs while shutting down: 1. VirtualDomain issues "virsh shutdown vm" to gracefully shutdown the VM 2. pacemaker waits 120 seconds for the timeout specified in the "op stop" timeout 3. VirtualDomain waits a bit less than 120 seconds to see if it will gracefully shutdown. Once it gets to almost 120 seconds, it issues "virsh destroy vm" to hard stop the VM. 4. pacemaker wakes up from the 120 second timeout and sees that the VM has stopped and proceeds with the failover However, I observed that VirtualDomain seems to be using the timeout from the "op start" line, 180 seconds, yet pacemaker uses the 120 second timeout. Thus, the VM is still running after the pacemaker timeout is reached and so the node is STONITHed. Here is the relevant section of code from /usr/lib/ocf/resource.d/heartbeat/VirtualDomain: VirtualDomain_Stop() { local i local status local shutdown_timeout local out ex VirtualDomain_Status status=$? case $status in $OCF_SUCCESS) if ! ocf_is_true $OCF_RESKEY_force_stop; then # Issue a graceful shutdown request ocf_log info "Issuing graceful shutdown request for domain ${DOMAIN_NAME}." virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME} # The "shutdown_timeout" we use here is the operation # timeout specified in the CIB, minus 5 seconds shutdown_timeout=$(( $NOW + ($OCF_RESKEY_CRM_meta_timeout/1000) -5 )) # Loop on status until we reach $shutdown_timeout while [ $NOW -lt $shutdown_timeout ]; do Doesn't $OCF_RESKEY_CRM_meta_timeout correspond to the timeout value in the "op stop ..." line? How can I optimize my pacemaker configuration so that the VM will attempt to gracefully shutdown and then at worst case destroy the VM before the pacemaker timeout is reached? Moreover, is there anything I can do inside of the VM (another Ubuntu 10.04 install) to optimize/speed up the shutdown process? Thanks, Andrew
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org