Hi, On Fri, Mar 05, 2010 at 10:50:30PM +0800, Martin Aspeli wrote: > Hi Dejan, > > Dejan Muhamedagic wrote: > >Hi, > > > >On Fri, Mar 05, 2010 at 10:00:06AM +0800, Martin Aspeli wrote: [...] > >> - I'm not sure we need to use Pacemaker to manage HAProxy on slave; > >>it will simply not be used until the IP address fails over to slave. > > > >The difference is that if it fails, the cluster won't be able to > >help you. Otherwise, you can configure it as a cloned resource. > > Yeah, maybe that's more appropriate. I guess this would mean writing > custom RA scripts, right?
It really depends on the nature of the resource, but in general, yes, you should be better off with monitoring. [...] > >> - The postgres data would need fencing when failing over, from what > >>I understand. I read the notes that using an on-board device like > >>Dell's DRAC to implemenet STONITH is not a good idea. We don't have > >>the option at this stage to buy a UPS-based solution (we do have > >>UPS, but it can't be used to cut power to individual servers). We do > >>have two pairs of NICs in each server, one of which would be used > >>"crossover" between master and slave. > > > >The problem with lights-out devices such as DRAC is that if they > >lose power then fencing doesn't work. But if you have them > >connected to UPS which is reliable then DRAC should be OK. > > Yeah, it's backed up by a diesel generator, so that should be fine, I guess. > > On the other hand, if the power supply blows up on the server, it > may still go wrong. That's probably acceptable risk, though. Well, probably not. I hope that your servers have dual power supplies. Those things do tend to break. Some lights-out devices come equipped with battery too. [...] > >> - If HAProxy or memcached on master fails (in the software sense), > >>we'd need to fail over the floating IP address so that the front-end > >>firewall and the Zope connection strings would continue to work, > >>even though we have hot standby's on slave. Is this the right thing > >>to do? > > > >Looks like it. > > Cool. If HAProxy and memcached become clone services running on both > nodes, that'd let us use pacemaker to keep them ticking? They'd probably be ticking by themselves, but pacemaker may be able to find out if they stopped, etc. > >>If so, I'd appreciate some pointers on how to configure this. > >>There are no resource agents that ship with Pacemaker I can find for > >>memcached/HAProxy, though perhaps it'd be better to create them and > >>let Pacemaker manage everything? > > > >It is also possible to use init scripts (lsb). I guess that those > >exist, just test them thoroughly. If you let the cluster manage > >them, they can be monitored. > > We actually use supervisord currently, which itself is a daemon that > manages the processes as non-daemons, and provides some optional > tools like auto-restart if they crash, logging and a web console to > start/stop services. However, that's not a firm requirement, and I > realise it overlaps with what pacemaker is doing to start, stop, > monitor etc. Yes. Well, perhaps that supervisord may fit the bill too. > Hence, we don't have init scripts, but I've written them before, so > I'm assuming writing basic OCF scripts wouldn't be that hard. Nope, though you have to make sure that they actually work ;-) There's also an RA which is called "anything" and may run any daemon or process (hence the name :) > >>In that case, how do we manage the > >>connection string issue (making sure Zope talks to the right thing) > >>if not by failing over an IP address? > > > >You lost me here. The IP address is going to failover. I don't > >see where is the "connection string issue"? > > Zope is running on both master and slave. There is a floating IP > address 192.168.245.10 Each Zope instance (on both servers) has a > connection string for memcached, e.g. 192.168.245.10:11211 in the > default scenario. There's also a memcached on slave:11211, either > running (but unused) or waiting to be started up by pacemaker. > > Let's say memcached dies on master, but the rest of master is happy, > as are the Zope instances on slave. pacemaker fails memcached over > to slave. However, all the active Zope instances still have in their > config file that it's on 192.168.245.10:11211, which is on master, > where memcached has died. > > What's the usual approach here? Group the IP address with the > services, so that if anything fails (the full node, the IP address > bind, memcached on master, postgres on master), all services fail > over? Yes. > Or create a virtual IP for each service (which would probably > mean we'd run out of NICs)? No, you wouldn't, since these are only virtual NICs. > Or do something else clever? :-) All this reminds me of LVS or similar. But if the slave can't serve requests, don't get the point of it. To start faster? Thanks, Dejan > Martin > > -- > Author of `Professional Plone Development`, a book for developers who > want to work with Plone. See http://martinaspeli.net/plone-book > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker