Hey guys... Anyone out there have experience with RedHat's HA software? I've downloaded it and am toying around, and have a few issues with the architecture (I'm using the example for setting up a five-node cluster I got off their website). Here's the deal: I've got two lvs-routers using NAT to pass the information to two webservers. The primary LVS router has it's real interfaces up, as well as an alias to the external interface for the virtual webserver and an alias to the internal interface for the virtual nat router address. The backup lvs just has its real interfaces up, and the webservers have theirs as well, happily oblivious to the fact that they're part of a cluster at all. So the heartbeat via pulse is functioning via the real external interfaces of the two LVS machines. When a failure is detected, the backup router brings up it's aliased interfaces, and takes over the role of primary, recieving info for the virtual server and acting as the virtual nat router for the real webservers. When the failed ex-primary box comes back up, it becomes the hot-backup. Here are my 2 main issues: 1) Pulse cannot tell the difference between a source and destination failure. IE, if the backup lvs router for some reason has a broken connection to the network, it will assume the primary server has failed because it does not recieve a heartbeat. It will then bring up it's virtual aliases and begin arp spoofing, and attempting to route. The webservers will all use lvs2 for their router (I deduced this experimentally as well as theoretically). Only since it has no external connection, it cannot route. So the cluster fails. This seems like it also might be the case if the external network connection on the primary failed, because it would leave its internal virtual interfaces up, AND the backup would also bring its internal interfaces up. Therefor 2 machines would be responding to the arps. I don't know exactly what the result of that scenario would be. Now, I BELIEVE this can be solved if I used direct routing instead of NAT. This way, since the webservers would be returning requests directly to the clients, they would only be dependant on the routers for incoming requests, and I haven't yet figured out a plausible scenario where pulse would allow both external virtual interfaces to be up at the same time. Even if it did, who cares, as long as the packets get to the webservers. 2) Pulse only has knowledge of one interface at a time. Therefor, if the internal interface on the primary lvs goes, and therefor its connection to the webservers goes, pulse will not transfer control to the backup because it continues to get a heartbeat through the external interface. Thus the cluster fails, because lvs1 continues to act as a virtual server without being able to communicate with the real servers. What I need to know is whether or not there's a better way to configure pulse to account for these situations. Perhaps running multiple instances that communicate with one another and make decisions based on the states of all of the nics. Or if it tried to heartbeat more than one server, in an effort to diagnose where the failure sits exactly. Or is that asking too much? Thanks in advance for your advice/answers... -Brian ----------------------- Brian J. Sweeney "I want to know God's thoughts ... the rest are details." -Albert Einstein Systems Admin, imagedog [EMAIL PROTECTED] _______________________________________________ techtalk mailing list [EMAIL PROTECTED] http://www.linux.org.uk/mailman/listinfo/techtalk