Re: [Pacemaker] Properly fencing Postgres

Martin Aspeli Fri, 05 Mar 2010 06:52:52 -0800

Hi Dejan,

Dejan Muhamedagic wrote:

Hi,


On Fri, Mar 05, 2010 at 10:00:06AM +0800, Martin Aspeli wrote:

Hi,

I'm pretty new to all this stuff, but I've read pretty much all the
documentation on the clusterlabs website. I'm seeking a bit of
clarification/confirmation on how to achieve certain things, in
particular around fencing/STONITH, before we dive into trying to set
this up.

We're using SuSE Linux Enterprise 11, and will probably be buying
their HA Extension, which includes OpenAIS and Pacemaker. We're
using Dell servers, with their DRAC on-board management console.

We have two physical app servers, master and slave. These are behind
a firewall which has some internal load balancing support (at least
basic round-robin).

On master, we will be running:

  - an HAProxy software load balancer
  - 8 Zope app server processes
  - memcached
  - PostgreSQL
  - A "blob" file store

On slave, we'll have:

  - a hot standby HAProxy instance
  - 8 further Zope app server processes, which are active
  - a hot standby memcached
  - PostgreSQL (probably in cold standby)


I don't know if the pgsql RA can support "cold standby"
instances.

Well, I don't mind how it works, so long as it works. ;) What is theusual approach with Postgres?

Also note:

  - We intend to use the IPaddr2 resource agent to cluster an IP
address across master and slave.

  - I'm not sure we need to use Pacemaker to manage HAProxy on slave;
it will simply not be used until the IP address fails over to slave.


The difference is that if it fails, the cluster won't be able to
help you. Otherwise, you can configure it as a cloned resource.

Yeah, maybe that's more appropriate. I guess this would mean writingcustom RA scripts, right?

  - To deal with sometimes severe peaks in our load, we'll have
HAProxy on master send certain requests to the "live" Zope app
server processes on slave. HAProxy deals with Zope processes going
up and down, so we don't really need to cluster these per se.

  - Zope communicates with Postgres. We intend that connection string
to use the floating IP address, so that if Postgres fails over to
slave, Zope will be unaware.

  - Memcached is used by Zope to cache certain Postgres database
queries, so it would be similar. We can have this on hot standby (if
that's easier?) since it only manages data in local RAM, but the
memcached connection string would use the floating IP address too.

  - Zope writes certain "blob" files to the filesystem. All Zope
clients (across both servers) need a shared blob directory. They do
implement locking on this directory, so concurrent access is not a
problem.

Now for the bits I'm less sure about:

  - We were thinking to create a DRBD partition with OCFS2 for
Postgres data + blob data. IIUC, that setup handles multiple nodes
writing, so the blob storage should work fine (since Zope will
ensure integrity of the directory

OK.

  - We were thing to use the pgsql resource agent that comes with
Pacemaker to manage Postgres.

  - The postgres data would need fencing when failing over, from what
I understand. I read the notes that using an on-board device like
Dell's DRAC to implemenet STONITH is not a good idea. We don't have
the option at this stage to buy a UPS-based solution (we do have
UPS, but it can't be used to cut power to individual servers). We do
have two pairs of NICs in each server, one of which would be used
"crossover" between master and slave.


The problem with lights-out devices such as DRAC is that if they
lose power then fencing doesn't work. But if you have them
connected to UPS which is reliable then DRAC should be OK.


Yeah, it's backed up by a diesel generator, so that should be fine, I guess.

On the other hand, if the power supply blows up on the server, it maystill go wrong. That's probably acceptable risk, though.

Given this, what is the best way to implement fencing in this
situation? Could we use DRBD to just refuse master write access to
the slave disk? Could we accept a bit more risk and say that STONITH
will succeed even if *communication* with the DRAC fails, but will
try to use DRAC if it can reach it?


This is not possible. If the fencing action fails, the cluster
won't make any progress.

I was wondering if we could trick it with a custom fencing agent so thatit thinks it's succeeded if in fact it failed due to an inability tocommunicate with the DRAC (as opposed to an error code returned by aresponsive DRAC, which would be of more concern). But I concede thatsounds like a dumb idea. ;)

This may solve the "fencing
indefinitely" problem when postgres is failing over due to a power
outage on master, and Pacemaker can't find DRAC to kill master.


On two-node clusters fencing replaces quorum so it is
indispensable.


Yeah, I get that.

  - If HAProxy or memcached on master fails (in the software sense),
we'd need to fail over the floating IP address so that the front-end
firewall and the Zope connection strings would continue to work,
even though we have hot standby's on slave. Is this the right thing
to do?


Looks like it.

Cool. If HAProxy and memcached become clone services running on bothnodes, that'd let us use pacemaker to keep them ticking?

If so, I'd appreciate some pointers on how to configure this.
There are no resource agents that ship with Pacemaker I can find for
memcached/HAProxy, though perhaps it'd be better to create them and
let Pacemaker manage everything?


It is also possible to use init scripts (lsb). I guess that those
exist, just test them thoroughly. If you let the cluster manage
them, they can be monitored.

We actually use supervisord currently, which itself is a daemon thatmanages the processes as non-daemons, and provides some optional toolslike auto-restart if they crash, logging and a web console to start/stopservices. However, that's not a firm requirement, and I realise itoverlaps with what pacemaker is doing to start, stop, monitor etc.

Hence, we don't have init scripts, but I've written them before, so I'massuming writing basic OCF scripts wouldn't be that hard.

In that case, how do we manage the
connection string issue (making sure Zope talks to the right thing)
if not by failing over an IP address?


You lost me here. The IP address is going to failover. I don't
see where is the "connection string issue"?

Zope is running on both master and slave. There is a floating IP address192.168.245.10 Each Zope instance (on both servers) has a connectionstring for memcached, e.g. 192.168.245.10:11211 in the default scenario.There's also a memcached on slave:11211, either running (but unused) orwaiting to be started up by pacemaker.

Let's say memcached dies on master, but the rest of master is happy, asare the Zope instances on slave. pacemaker fails memcached over toslave. However, all the active Zope instances still have in their configfile that it's on 192.168.245.10:11211, which is on master, wherememcached has died.

What's the usual approach here? Group the IP address with the services,so that if anything fails (the full node, the IP address bind, memcachedon master, postgres on master), all services fail over? Or create avirtual IP for each service (which would probably mean we'd run out ofNICs)? Or do something else clever? :-)


Martin

--
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Properly fencing Postgres

Reply via email to