On 2015-04-17 12:36, Thomas Manninger wrote: > Hi list, > > i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over > ipmi interface. > > My problem is, that sometimes, a wrong node is stonithed. > As example: > I have 4 servers: node1, node2, node3, node4 > > I start a hardware- reset on node node1, but node1 and node3 will be > stonithed.
You have to tell pacemaker exactly what stonith-resource can fence what node if the stonith agent you are using does not support the "list" action. Do this by adding "pcmk_host_check=static-list" and "pcmk_host_list" to every stonith-resource like: primitive p_stonith_node3 stonith:external/ipmi \ op monitor interval=3s timeout=20s \ params hostname=node3 ipaddr=10.100.0.6 passwd_method=file passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus priv=OPERATOR \ pcmk_host_check="static-list" pcmk_host_list="node3" ... see "man stonithd". Best regards, Andreas > > In the cluster.log, i found following entry: > Apr 17 11:02:41 [20473] node2 stonithd: debug: > stonith_action_create: Initiating action reboot for agent > fence_legacy (target=node1) > Apr 17 11:02:41 [20473] node2 stonithd: debug: make_args: > Performing reboot action for node 'node1' as 'port=node1' > Apr 17 11:02:41 [20473] node2 stonithd: debug: > internal_stonith_action_execute: forking > Apr 17 11:02:41 [20473] node2 stonithd: debug: > internal_stonith_action_execute: sending args > Apr 17 11:02:41 [20473] node2 stonithd: debug: > stonith_device_execute: Operation reboot for node node1 on > p_stonith_node3 now running with pid=113092, timeout=60s > > node1 will be reseted with the stonith primitive of node3 ?? Why?? > > my stonith config: > primitive p_stonith_node1 stonith:external/ipmi \ > params hostname=node1 ipaddr=10.100.0.2 passwd_method=file > passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus > priv=OPERATOR \ > op monitor interval=3s timeout=20s \ > meta target-role=Started failure-timeout=30s > primitive p_stonith_node2 stonith:external/ipmi \ > op monitor interval=3s timeout=20s \ > params hostname=node2 ipaddr=10.100.0.4 passwd_method=file > passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus > priv=OPERATOR \ > meta target-role=Started failure-timeout=30s > primitive p_stonith_node3 stonith:external/ipmi \ > op monitor interval=3s timeout=20s \ > params hostname=node3 ipaddr=10.100.0.6 passwd_method=file > passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus > priv=OPERATOR \ > meta target-role=Started failure-timeout=30s > primitive p_stonith_node4 stonith:external/ipmi \ > op monitor interval=3s timeout=20s \ > params hostname=node4 ipaddr=10.100.0.8 passwd_method=file > passwd="/etc/stonith_ipmi_passwd" userid=stonith interface=lanplus > priv=OPERATOR \ > meta target-role=Started failure-timeout=30s > > Somebody can help me?? > Thanks! > > Regards, > Thomas > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org