[Pacemaker] node offline after fencing (pacemakerd hangs)

Ulrich Leodolter Tue, 17 Jul 2012 06:28:02 -0700

hi,

i have setup a very basic 2-node cluster on RHEL 6.3
first thing i tried was to setup stonith/fencing_ipmilan
resource.


fencing seems to work,  if i kill corosync on one node
it is restarted (ipmi reboot) by the other node.  

but after restart the cluster doesn't come back to normal
operation,   i looks like the pacemakerd hangs and the
node status is offline.

i found only one way to fix the problem:

killall -9 pacemakerd
service pacemakerd start

after that both nodes are online.  below you can see my
cluster configuration and the corosync.log messages which
repeat forever when pacemakerd hangs.

i am new to pacemaker and followed the "Clusters from Scratch"
guide for the first setup.   information about fence_ipmilan
is from google :-)

can u give me tips ?? what is wrong with this basic cluster
config.  i don't want to add more resources (kvm virtual
machines) until fencing is configured correctly.

thx
ulrich



[root@pcmk1 ~]# crm configure show
node pcmk1 \
        attributes standby="off"
node pcmk2 \
        attributes standby="off"
primitive p_stonith_pcmk1 stonith:fence_ipmilan \
        params auth="password" ipaddr="192.168.120.171" passwd="xxx" 
lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" 
pcmk_host_check="static-list" pcmk_host_list="pcmk1" \
        meta target-role="started"
primitive p_stonith_pcmk2 stonith:fence_ipmilan \
        params auth="password" ipaddr="192.168.120.172" passwd="xxx" 
lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" 
pcmk_host_check="static-list" pcmk_host_list="pcmk2" \
        meta target-role="started"
location loc_p_stonith_pcmk1_pcmk1 p_stonith_pcmk1 -inf: pcmk1
location loc_p_stonith_pcmk2_pcmk2 p_stonith_pcmk2 -inf: pcmk2
property $id="cib-bootstrap-options" \
        expected-quorum-votes="2" \
        dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
        no-quorum-policy="ignore" \
        cluster-infrastructure="openais"
rsc_defaults $id="rsc-options" \
        resource-stickiness="200"


/var/log/cluster/corosync.log:

Jul 13 11:29:41 [1859] pcmk2       crmd:     info: do_dc_release:       DC role 
released
Jul 13 11:29:41 [1859] pcmk2       crmd:     info: do_te_control:       
Transitioner is now inactive
Jul 13 11:29:41 [1854] pcmk2        cib:     info: set_crm_log_level:   New log 
level: 3 0
Jul 13 11:30:01 [1859] pcmk2       crmd:     info: crm_timer_popped:    
Election Trigger (I_DC_TIMEOUT) just popped (20000ms)
Jul 13 11:30:01 [1859] pcmk2       crmd:  warning: do_log:      FSA: Input 
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jul 13 11:30:01 [1859] pcmk2       crmd:   notice: do_state_transition:         
State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT 
cause=C_TIMER_POPPED origin=crm_timer_poppe
d ]
Jul 13 11:30:01 [1859] pcmk2       crmd:     info: do_election_count_vote:      
Election 8 (owner: pcmk1) lost: vote from pcmk1 (Uptime)
Jul 13 11:30:01 [1859] pcmk2       crmd:   notice: do_state_transition:         
State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL 
origin=do_election_count_
vote ]


-- 
Ulrich Leodolter <ulrich.leodol...@obvsg.at>
OBVSG


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] node offline after fencing (pacemakerd hangs)

Reply via email to