hi, i have setup a very basic 2-node cluster on RHEL 6.3 first thing i tried was to setup stonith/fencing_ipmilan resource.
fencing seems to work, if i kill corosync on one node it is restarted (ipmi reboot) by the other node. but after restart the cluster doesn't come back to normal operation, i looks like the pacemakerd hangs and the node status is offline. i found only one way to fix the problem: killall -9 pacemakerd service pacemakerd start after that both nodes are online. below you can see my cluster configuration and the corosync.log messages which repeat forever when pacemakerd hangs. i am new to pacemaker and followed the "Clusters from Scratch" guide for the first setup. information about fence_ipmilan is from google :-) can u give me tips ?? what is wrong with this basic cluster config. i don't want to add more resources (kvm virtual machines) until fencing is configured correctly. thx ulrich [root@pcmk1 ~]# crm configure show node pcmk1 \ attributes standby="off" node pcmk2 \ attributes standby="off" primitive p_stonith_pcmk1 stonith:fence_ipmilan \ params auth="password" ipaddr="192.168.120.171" passwd="xxx" lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" pcmk_host_check="static-list" pcmk_host_list="pcmk1" \ meta target-role="started" primitive p_stonith_pcmk2 stonith:fence_ipmilan \ params auth="password" ipaddr="192.168.120.172" passwd="xxx" lanplus="true" login="pcmk" timeout="20s" power_wait="5s" verbose="true" pcmk_host_check="static-list" pcmk_host_list="pcmk2" \ meta target-role="started" location loc_p_stonith_pcmk1_pcmk1 p_stonith_pcmk1 -inf: pcmk1 location loc_p_stonith_pcmk2_pcmk2 p_stonith_pcmk2 -inf: pcmk2 property $id="cib-bootstrap-options" \ expected-quorum-votes="2" \ dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \ no-quorum-policy="ignore" \ cluster-infrastructure="openais" rsc_defaults $id="rsc-options" \ resource-stickiness="200" /var/log/cluster/corosync.log: Jul 13 11:29:41 [1859] pcmk2 crmd: info: do_dc_release: DC role released Jul 13 11:29:41 [1859] pcmk2 crmd: info: do_te_control: Transitioner is now inactive Jul 13 11:29:41 [1854] pcmk2 cib: info: set_crm_log_level: New log level: 3 0 Jul 13 11:30:01 [1859] pcmk2 crmd: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped (20000ms) Jul 13 11:30:01 [1859] pcmk2 crmd: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Jul 13 11:30:01 [1859] pcmk2 crmd: notice: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_poppe d ] Jul 13 11:30:01 [1859] pcmk2 crmd: info: do_election_count_vote: Election 8 (owner: pcmk1) lost: vote from pcmk1 (Uptime) Jul 13 11:30:01 [1859] pcmk2 crmd: notice: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_ vote ] -- Ulrich Leodolter <ulrich.leodol...@obvsg.at> OBVSG _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org