Hello Ken,

Friday, November 03, 2000, 1:30:41 AM, you wrote:
> Two identical email servers can be setup to share a private NFS
> partion off a raid array. Each server provides the exact same
> services as the other machine. Each machine has its own IP. All
> clients are told to connect a third IP for services. One of the
> servers will start out with the shared IP. If this server ever
> dies and goes offline, the second server, using vqalive, will
> detect the failure and assign the IP to itself. As a hot swap
> over. Clients can then continue to get services. And the IP
> address is switched transparently. 

In general, those tools are quite handy, however they can leave you in
a false sense of security about your failover setup and thus
you should always keep one thing in mind when using this or
similar IP failover tools: their whole concept bases on the assumption
that a server which fails will do it in a way where it is either still
able to give it's IP free for its twin (which normally gets triggered
by some signal of the twin) OR it will crash that badly
that it won't respond to any IP requests and basically frees the IP as
well but thus the failover will fail if the primary server crashes in a
way that it doesn't respond to all IP request anymore but still claims
to have his IP. It's hard to describe this admittedly strange, uncommon state,
but we *had* machines which were answering to pings but the daemons on
them weren't accepting any connections.

When we were trying to deploy a suitable failover system for our servers,
we encountered such a situation several times (although we unfortunately
weren't able to reproduce it and therefore believe it was some kind of
race situation which never happened to any production machines, BTW)
and we found only one real solution to it: you need to have some kind
of NAT load balancer (this might be ipnat with some custom hacks or
heavy metal stuff such as the Foundry Network ServerIron) / intelligent switch or 
router
or some other device which is able to physically disconnect a server
from the network so his spare twin is able to get the IP without
screwing up the network by having two identical IPs which generally is
a rather bad thing (TM).

I'm not saying vqalive isn't working as I haven't yed had the time to
check it out but (which I'll surely do this weekend) but I thought
that it's fair to know where the limitations of this concept lay. Oh
and Ken, if I'm wrong with my assumptions then I'd be very interested to know
how you managed to solve the case I described above.



Best regards,
 Gabriel


Reply via email to