Hi, Thank you for your kind explanation! I tried the latest fence-agents-3.1.9.
# rpm -e fence-agents-3.1.5-10.el6.x86_64 # wget https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-3.1.9.tar.gz # tar zxf fence-agents-3.1.9.tar.gz # cd fence-agents-3.1.9 # ./configure --prefix=/usr --libdir=/usr/lib64 --sysconfdir=/etc --localstatedir=/var # make install # echo "option=metadata" > foo # cat foo | fence_kdump [error]: action 'off' requires nodename # echo "action=metadata" > foo # cat foo | fence_kdump <?xml version="1.0" ?> <resource-agent name="fence_kdump" shortdesc="Fence agent for use with kdump"> <longdesc>The fence_kdump agent is intended to be used with with kdump service.</longdesc> .... fence_baytech which you mentioned on Bugzilla supports "action" now. # echo "action=metadata" > foo # cat foo | fence_baytech <?xml version="1.0" ?> <resource-agent name="fence_baytech" shortdesc="I/O Fencing agent for Baytech RPC switches in combination with a Cyclades Terminal Server" > <longdesc> ... and changed the value of STONITH_ATTR_ACTION_OP to "action" manually for now. I think it works well :) # cd ../beekhof/ # git pull # git show commit ca505c05b11e2931764653bf675ce948feccce5e Author: Andrew Beekhof <and...@beekhof.net> Date: Fri Aug 3 12:34:16 2012 +1000 Low: PE: Supress 'multi active' error for fencing devices on unclean nodes # vim ./include/crm/fencing/internal.h //#define STONITH_ATTR_ACTION_OP "option" /* To be replaced by 'action' at some point */ #define STONITH_ATTR_ACTION_OP "action" /* To be replaced by 'action' at some point */ # make install # rm -f /var/lib/pacemaker/cib/* # rm -f /var/lib/pacemaker/pengine/* # logrotate -f /etc/logrotate.conf # service corosync start # service pacemaker start # cat /home/crm/trac2051-kdump.crm property no-quorum-policy="ignore" \ stonith-enabled="true" \ startup-fencing="false" \ stonith-timeout="120s" \ crmd-transition-delay="2s" rsc_defaults \ resource-stickiness="INFINITY" \ migration-threshold="1" primitive stonith-1 stonith:fence_kdump \ params \ pcmk_host_check="static-list" \ pcmk_host_list="bl460g6c" \ pcmk_reboot_action="off" \ pcmk_monitor_action="metadata" \ nodename=bl460g6c \ timeout=180 primitive stonith-2 stonith:fence_kdump \ params \ pcmk_host_check="static-list" \ pcmk_host_list="bl460g6d" \ pcmk_reboot_action="off" \ pcmk_monitor_action="metadata" \ nodename=bl460g6d \ timeout=180 location location-1 stonith-1 \ rule -INFINITY: #uname eq bl460g6c location location-2 stonith-2 \ rule -INFINITY: #uname eq bl460g6d # crm configure load update trac2051-kdump.crm # crm_mon -1 ============ Last updated: Mon Aug 6 11:14:18 2012 Last change: Mon Aug 6 11:13:18 2012 via cibadmin on bl460g6c Stack: corosync Current DC: bl460g6d (2) - partition with quorum Version: 1.1.7-e986274 2 Nodes configured, unknown expected votes 2 Resources configured. ============ Online: [ bl460g6c bl460g6d ] stonith-1 (stonith:fence_kdump): Started bl460g6d stonith-2 (stonith:fence_kdump): Started bl460g6c # ls -l /var/crash/; date total 0 Mon Aug 6 11:13:57 JST 2012 # echo 1 > /proc/sys/kernel/sysrq # echo c > /proc/sysrq-trigger # tail -f /var/log/ha-log Aug 6 11:14:50 bl460g6d pengine[3605]: warning: pe_fence_node: Node bl460g6c will be fenced because the node is no longer part of the cluster Aug 6 11:14:50 bl460g6d pengine[3605]: warning: determine_online_status: Node bl460g6c is unclean Aug 6 11:14:50 bl460g6d pengine[3605]: warning: custom_action: Action stonith-2_stop_0 on bl460g6c is unrunnable (offline) Aug 6 11:14:50 bl460g6d pengine[3605]: warning: custom_action: Action stonith-2_stop_0 on bl460g6c is unrunnable (offline) Aug 6 11:14:50 bl460g6d pengine[3605]: warning: stage6: Scheduling Node bl460g6c for STONITH Aug 6 11:14:50 bl460g6d pengine[3605]: notice: LogActions: Stop stonith-2 (bl460g6c) Aug 6 11:14:50 bl460g6d pengine[3605]: warning: process_pe_message: Transition 2: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pacemaker/pengine/pe-warn-0.bz2 Aug 6 11:14:50 bl460g6d crmd[3606]: notice: te_fence_node: Executing reboot fencing operation (9) on bl460g6c (timeout=120000) Aug 6 11:16:20 bl460g6d stonith-ng[3602]: notice: log_operation: Operation 'reboot' [3644] (call 0 from ebe2612f-0451-4d6a-bf29-9f8323005b2b) for host 'bl460g6c' with device 'stonith-1' returned: 0 Aug 6 11:16:20 bl460g6d stonith-ng[3602]: notice: remote_op_done: Operation reboot of bl460g6c by bl460g6d for bl460g6d[ebe2612f-0451-4d6a-bf29-9f8323005b2b]: OK # ls -l /var/crash/; date total 4 drwxr-xr-x 2 root root 4096 Aug 6 11:16 127.0.0.1-2012-08-06-11:16:19 Mon Aug 6 11:20:08 JST 2012 Thanks, Junko _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org