Serge Dubrouski wrote:
On Wed, Mar 10, 2010 at 6:59 PM, Martin Aspeli<optilude+li...@gmail.com>  wrote:
Serge Dubrouski wrote:
On Wed, Mar 10, 2010 at 5:30 PM, Martin Aspeli<optilude+li...@gmail.com>
  wrote:
Martin Aspeli wrote:
Hi folks,

Let's say have a two-node cluster with DRBD and OCFS2, with a database
server that's supposed to be active on one node at a time, using the
OCFS2 partition for its data store.

If we detect a failure on the active node and fail the database over to
the other node, we need to fence off the shared storage in case the
active node is still writing to it.

Can this be done in such a way that the local DRBD/OCFS2 refuses to
accept writes from the now-presumed-dead node? I guess this would be
similar to putting an access rule on a SAN to block off the previously
active node from attempting to read or write any data.

Is this feasible?
We went off on a side-track, I think, but I'd still like to know the
answer:
Can one "fence" at the DRBD level?

  From the thread, it sounds like we'll not use OCFS2 for the Postgres
data
store, but would still use DRBD, e.g. with ext4 or whatever. The fencing
problem would then be equally, if not more, acute.

It's basically between doing something at the DRBD level, if that's
feasible, or using the DRAC IPMI device on our server to shoot it.
But if you implement fencing on Pacemaker level and include your
DRBD/Filesystem resource into Pacemaker configuration you'll be fine.
Sorry, I don't quite understand what you mean.

What would "fencing on the Pacemaker level" look like? Certainly, DRBD would
be managed by the cluster.


That means that you have to implement STONITH through DRAC or any
other device that will provide fencing capability. In this case if
Pacemaker detects a split-brain situation it'll kill one of the nodes.

Right, that makes sense.

I was wondering, though, if fencing at the DRBD level would get around the possible problem with a full power outage taking the fencing device down.

In my poor understanding of things, it'd work like this:

 - Pacemaker runs on master and slave
 - Master loses all power
- Pacemaker on slave notices something is wrong, and prepares to start up postgres on slave, which will now also be the one writing to the DRBD disk
 - Before it can do that, it wants to fence off DRBD
- It does that by saying to the local DRBD, "even if the other node tries to send you stuff, ignore it". This would avoid the risk of data corruption on slave. Before master could came back up, it'd need to wipe its local partition and re-sync from slave (which is now the new primary).

Martin


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to