On Tue, 2008-07-15 at 16:20 +0100, Pete French wrote: > > However, I must ask you this: why are you doing things the way you are? > > Why are you using the equivalent of RAID 1 but for entire computers? Is > > there some reason you aren't using a filer (e.g. NetApp) for your data, > > thus keeping it centralised? > > I am not the roiginal poster, but I am doing something very similar and > can answer that question for you. Some people get paranoid about the > whole "single point of failure" thing. I originally suggestted that we buy > a filer and have identical servers so if one breaks we connect the other > to the filer, but the response I got was "what if the filer breaks?". So > in the end I had to show we have duplicate independent machines, with the > data kept symetrical on them at all times. > > It does actually work quite nicely actually - I have an "'active" database > machine, and a "passive". The opassive is only used if the active fails, > and the drives are run as a gmirror pair with the remote one being mounted > using ggated. It also means I can flip from active to passive when I want > to do an OS upgrade on the active machine. Switching takes a few seconds, > and this is fine for our setup. > > So the answer is that the descisiuon was taken out of my hands - but this > is not uncommon, and as a roll-your-own cluster it works very nicely. > > -pete. > _______________________________________________
I have for now gone with using ggate[cd] along with zpool and so far it's not bad. I can fail the master, stop ggated on the slave at which point geom reads the glabeled disks. From there I can zpool import to an alternate root. When the master comes back up I can zpool export and then, on the master, zpool import at which point zfs handles the resilvering. The *big* issue I have right now is dealing with the slave machine going down. Once the master no longer has a connection to the ggated devices, all processes trying to use the device hang in D status. I have tried pkill'ing ggatec to no avail and ggatec destroy returns a message of gctl being busy. Trying to ggatec destroy -f panics the machine. Does anyone know how to successfully time out a failed ggatec connection so that I can zpool detach or somehow have zfs removed the unavailable drive? Sven
signature.asc
Description: This is a digitally signed message part