Am 17.06.2013 19:16, schrieb Digimer:
On 06/17/2013 12:30 PM, Elmar Marschke wrote:

Am 17.06.2013 15:59, schrieb Digimer:
On 06/17/2013 09:53 AM, andreas graeper wrote:
hi,
i will not have a stonith-device. i can test for a day a 'expert power
control 8212', but in the end i will stay without.

This is an extremely flawed approach. Clustering with shared storage and
without stonith will certainly cause data loss or corruption eventually.
I can not stress this enough.

hi all,

just an idea, or moreover a question: what about using drbd's abilities
to automatically handle split brain situations instead of "real
stonithing" ; maybe like this (global_common.conf):

handlers {
         split-brain "/usr/lib/drbd/notify-split-brain.sh root";
         pri-lost-after-sb "/usr/local/sbin/reboot.sh";
     }

     net {
         after-sb-0pri discard-least-changes;
         after-sb-1pri call-pri-lost-after-sb;
         after-sb-2pri call-pri-lost-after-sb;
     }

Couldn't this work like a "poor man's stonith device"?
(Of course this reboots the whole node with all ressources and discards
the node with the least changes, but maybe there are situations where
this doesn't matter?)

regards

Elmar

There are two issues here.

First; Pacemaker/corosync needs fencing anyway, and it has a very large
array of supported fence devices. These are very well tested in the field.

Second; If you put fencing into DRBD directly, you are duplicating
effort and configs. The 'crm-fence-peer.sh' script was written to "hook"
DRBD's fencing into the existing pacemaker fencing. This way, you have
one place to configure and maintain, rather than two.

Back to this specific case;

Andreas tested by failing corosync. This would trigger pacemaker to see
the node as failed and try to recover the services on the backup node.
All of this happens without DRBD directly knowing what was going on. Had
Andreas configured fencing, as soon as pacemaker called it's fence
against the peer, it would have shut down and then DRBD would have known
something was wrong (and block) before a split-brain could occur.

It also would mean that, when pacemaker recovered/promoted the surviving
node, it would not have happened until the peer was off, also protecting
against a split-brain.



Thanks Digimer for pointing out the differences between these approaches... sure a good way to find a cleaner solution.

best regards
e.


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to