[Pacemaker] lvm ra timeouts and vgdisplay hang

James Harper Wed, 17 Oct 2012 04:40:29 -0700

I've been having a problem with the lvm ra when used in conjunction with clvm 
when a node dies (eg when I destroy the vm to test this particular scenario)


clvm re-organises itself just fine, and comes good well within the lvm ra 
timeout I set (60 seconds), but if the "vgdisplay -v vg-drbd" command is 
executed by the lvm ra monitor op while clvm is learning that the node is 
dropped it hangs forever and the ra monitor times out.

I worked around this by doing this in the monitor of the ra:

        rc=124
        limit=10
        while [ $limit -ge 0 -a $rc -eq 124 ]
        do
        limit=`expr $limit - 1`
                timeout --kill-after=5s 5s vgdisplay -v $1 2>&1 | grep -i 
'Status[ \t]*available' 2>&1 >/dev/null
                rc=$?
        done
        return $rc

which kills the hung vgdisplay if it goes more than 5 seconds (should never) 
and retries the operation a few times, and seems to work. Now I can kill a node 
without the cluster falling to pieces and going on a stonith frenzy (actually 
it sometimes still does, but not for that reason)

Maybe someone will find this useful? Or tell me a better way to do it (other 
than fix the bug in vgdisplay :)?
 
Thanks

James



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] lvm ra timeouts and vgdisplay hang

Reply via email to