[Pacemaker] VirtualDomain Shutdown Timeout

Andrew Martin Sat, 24 Mar 2012 13:41:31 -0700

Hello,


I have configured a KVM virtual machine primitive using Pacemaker 1.1.6 and 
Heartbeat 3.0.5 on Ubuntu 10.04 Server using DRBD as the storage device (so 
there is no shared storage, no live-migration): 

primitive p_vm ocf:heartbeat:VirtualDomain \ 
params config="/vmstore/config/vm.xml" \ 
meta allow-migrate="false" \ 
op start interval="0" timeout="180s" \ 
op stop interval="0" timeout="120s" \ 
op monitor interval="10" timeout="30" 


I would expect the following events to happen on failover on the "from" node 
(the migration source) if the VM hangs while shutting down: 
1. VirtualDomain issues "virsh shutdown vm" to gracefully shutdown the VM 
2. pacemaker waits 120 seconds for the timeout specified in the "op stop" 
timeout 
3. VirtualDomain waits a bit less than 120 seconds to see if it will gracefully 
shutdown. Once it gets to almost 120 seconds, it issues "virsh destroy vm" to 
hard stop the VM. 
4. pacemaker wakes up from the 120 second timeout and sees that the VM has 
stopped and proceeds with the failover 


However, I observed that VirtualDomain seems to be using the timeout from the 
"op start" line, 180 seconds, yet pacemaker uses the 120 second timeout. Thus, 
the VM is still running after the pacemaker timeout is reached and so the node 
is STONITHed. Here is the relevant section of code from 
/usr/lib/ocf/resource.d/heartbeat/VirtualDomain: 
VirtualDomain_Stop() { 
local i 
local status 
local shutdown_timeout 
local out ex 


VirtualDomain_Status 
status=$? 


case $status in 
$OCF_SUCCESS) 
if ! ocf_is_true $OCF_RESKEY_force_stop; then 
# Issue a graceful shutdown request 
ocf_log info "Issuing graceful shutdown request for domain ${DOMAIN_NAME}." 
virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME} 
# The "shutdown_timeout" we use here is the operation 
# timeout specified in the CIB, minus 5 seconds 
shutdown_timeout=$(( $NOW + ($OCF_RESKEY_CRM_meta_timeout/1000) -5 )) 
# Loop on status until we reach $shutdown_timeout 
while [ $NOW -lt $shutdown_timeout ]; do 


Doesn't $OCF_RESKEY_CRM_meta_timeout correspond to the timeout value in the "op 
stop ..." line? 


How can I optimize my pacemaker configuration so that the VM will attempt to 
gracefully shutdown and then at worst case destroy the VM before the pacemaker 
timeout is reached? Moreover, is there anything I can do inside of the VM 
(another Ubuntu 10.04 install) to optimize/speed up the shutdown process? 


Thanks, 


Andrew

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] VirtualDomain Shutdown Timeout

Reply via email to