I have made some luck getting STONITH to work but still running into
a problem I can not figure out how to debug.
In the ha.cf on each host I have:
stonith_host mds2.engin.umich.edu external/ipmi mds2.engin.umich.edu
mds2-m.engin.umich.edu root PASSWORD
stonith_host mds1.engin.umich.edu external/ipmi mds1.engin.umich.edu
mds1-m.engin.umich.edu root PASSWORD
Now heartbeat does try to kill the node where I kill heartbeat. In
the log I see:
heartbeat[12013]: 2008/07/31_15:47:56 info: Resetting node
mds1.engin.umich.edu with [IPMI STONITH device]
heartbeat[12013]: 2008/07/31_15:47:57 info: glib: external_run_cmd:
Calling '/usr/lib64/stonith/plugins/external/ipmi reset
mds1.engin.umich.edu' returned 256
heartbeat[12013]: 2008/07/31_15:47:57 ERROR: glib:
external_reset_req: 'ipmi reset' for host mds1.engin.umich.edu failed
with rc 256
I can run:
stonith -t external/ipmi -p "mds1.engin.umich.edu mds1-
m.engin.umich.edu root PASSWORD" -T reset mds1.engin.umich.edu
and the dead node will restart. So from the documentation of 1.x
style configs I am not sure where to debug why the stonith_host lines
do not work.
mds1 and mds2 are the nodes of the cluster, mds1-m and mds2-m are
the hostnames of the IPMI devices which have lan configs set up. Note
how stonith from the cmd line works just fine, just not in heartbeat.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems