Re: [vchkpw] My single point of failure... failed

mlist Mon, 08 Oct 2007 05:37:33 -0700

DAve wrote:

Tren Blackburn wrote:
Hi DAve;
-----Original Message-----
From: DAve [mailto:[EMAIL PROTECTED]
Sent: Friday, October 05, 2007 11:39 AM
To: vpopmail
Subject: [vchkpw] My single point of failure... failed

I got bit hard this morning and I am looking for a solution. I have
been
slowly getting our email system up to snuff moving from a pair of
servers to two gateway AV scanners, three vpopmail toasters, and two
outbound qmail servers. The toasters mount the Maildirs via NFS, the
AV
scanners  talk to the toasters via milter-ahead, and the NFS mailstore
hosts MySQL for vpopmail.

I've just gotten load balancers installed and moved the outbound
traffic
there first, getting a good load test on vpopmaild for smtp-auth. I
had
promised to provide the scripts and now I am actually seeing how well
they work.

Problems arose when my NFS server went stupid this morning and all
mail
stopped. AV scanners couldn't verify mailboxes because the toasters
couldn't see MySQL, the outbound servers couldn't do smtp-auth for the
same reason. It wouldn't have mattered anyway because my Maildirs were
offline. NFS is my single point of failure, even though it is RAID5,
dual NIC, dual power supply (SUN Enterprise 250), it went offline.

I need to fix that, I can cluster MySQL but I am looking for ways to
have either a clustered NFS with rw permissions and appropriate
locking/syncing, or NFS failover from the toasters.

I am looking at GFS and active/active NFS and HaNFS. Has anyone gone
down this path yet?
I have.  There's a couple ways of doing this.  I've never played with
GFS so I can't comment on that.  The easiest solution I've found is
doing an Active/Standby configuration between 2 nodes using DRBD to
replicate the data in real time.  There's quite a few solutions out
there to handle resource seizure on node failure.  If you want
absolutely simple, go heartbeat v1.  If you want to break your mailstore
into 2 pieces (I have no idea how large of a mailstore you're working
with.  Mine is breaking 70G pretty soon) then you can do an
Active/Active configuration using the High Availability manager from
LinuxHA.net.  I like that product mainly because it's written
specifically for 2 node active/active clusters.  And if you really want
to muddy the waters, you can go with heartbeat v2 (I still have a bad
taste in my mouth from it though)

It's always best to keep major components on their own sets of boxen.
My MySQL servers are a 2 node load balanced multi-master replicated
pair.  My Mailstore is a 2 node Active/Passive pair as described above
(I cheat a bit and do some iSCSI exports on the "passive" box to the
Windows people who demanded I share my storage with them.  It's also
handled by the HA software, so if the box exporting the iSCSI targets
goes down, it shuffles across to the NFS box, and vice-versa)
My inbound/outbound SMTP is across 4 dedicated load-balanced boxen.
IMAP4(s)/POP3(s) is on its own pair and same with Web.If any of this seems useful to you let me know. No one should have to
go through the nightmare of a key server going down.  I hate getting
yelled at.  :)
I am at least on the right or similar track. Here is some morebackground.
Currently the gateways runMailScanner/sendmail/spamassassin/clamav/bitdefender, we havevpopmail/chkuser on the eclusters (toasters) providing pop andwebmail, and the outbound servers provide smtp and smtp-auth (tobecome smtp-auth only) also running spamassassin and clamav via simscan.
Everything sits behind a PIX and everything will eventually sit behindtwo Coyote Point EQ350si devices. Right now only the outbound serversare being load balanced.
I am liking the look of HaNFS and DRDB but I have to look toward thefuture which involves sending half my mail system to a remote NOC. Wehave a dedicated 1GB fiber to provide a private LAN between the NOCs.My concern is over resyncing the mailstores after a fiber failure,which I KNOW will happen sooner or later. Not real sure ifactive/active or active/passive will be the best option, resyncing ingeneral doesn't look inviting. My mailstore is only 60GB, few clientsuse webmail, most download everything all day. But it would certainlybe a concern.
When I setup MySQL as a cluster I will also be installing a local ROslave on each ecluster (toaster), just for auth purposes.
I am assuming you found no problems running vpopmail/qmail on yourmailstores? How do you handle failover? Any problems with qmail-localduring deliveries?
Thanks for the response.

DAve

This is my setup, it seems to work fairly well. I was using NFS for themail stores at one point but because I couldn't get a handle on myperformance problems I dropped it and put the mail stores on the localmachine.

I have two machines with two drives in each machine. Disk sda1 on eachmachine is the OS, sda2 is configured via drbd (in retrospect I shouldhave raided my drbd device . . . too late now). I have on the drbddevice 1) the mail stores, 2) my vpopmail ~/etc files, 3) my qmailcontrol files, and 4) mysql database files. Heartbeat is set toauto_failback off - that way when one machine goes down (and comes up) Iwon't encounter a split-brain scenario where each machine is offeringthe same services.

When machine one went down machine two mounted the drbd device, startedmysql, and then started qmail. When machine one came back up I was ableto get it resynced (quite easily) and then set it back as the primarymachine.

Since you have a 1Gb fiber connection between locations you probablydon't need to worry too much about sync options with drbd (don't know ifyou've looked but there are three ways you can sync data).

So all in all I haven't had any problems with my setup yet *knocks onwood*. You could easily apply the above to your NFS setup too. My twocents.


Matt

Re: [vchkpw] My single point of failure... failed

Reply via email to