On 21/05/2013, at 4:19 PM, Nikita Michalko <[email protected]> wrote:
> > > Am Dienstag, 21. Mai 2013 00:00:03 schrieb DaveW: >> We are running heartbeat 2.1.3 on CentOS 5.4. Last Monday AM, I > > - Man, so OLD! Any chance to update to the latest version ? In haresources mode, there is very little difference > > > Nikita Michalko > >> received a call while getting ready for work. Our high availability >> server was not responding. The previous Saturday, our I.T. admins had >> re-configured the network to expand IP address ranges on some subnets. >> For whatever reason, this action caused our main server (in a two-node >> HA configuration) to loose its virtual interface, rendering our >> high-availability server unavailable. >> >> The network worked fine; the nodes could ping each other based on their >> normal IP's and they could ping the ping node, but the virtual IP (the >> one we REALLY care about) was ignored. Nothing in the logs, no errors, >> nothing. Just an unresponsive virtual server. A manual fail-over >> brought it back quickly as the backup took over. I.T. had done their >> work on Sat and, had I checked our server on Sunday, I would have found >> it "unreachable" with a normal ping. >> >> When my colleague called me, I asked him what "ifconfig" looked like. >> He described three interfaces; eth0, eth1 and lo; no eth0:0. I had him >> initiate the manual fail-over. >> >> After pouring over the logs, unable to find anything that indicated a >> problem, I tried to simulate the problem with "ifconfig eth0:0 down". >> Sure enough, no fail-over, no errors, nothing; just (once again) an >> unresponsive server. "ifconfig eth0:0 <IP_ADDRESS> up" brought it right >> back (I tried this last Saturday, BTW, when no one was working). It >> seems that heartbeat (ipfail?) creates this virtual interface when it >> starts, then forgets about it. I presume that the assumption is that if >> eth0 remains intact, eth0:0 will remain intact, as well. >> >> Am I missing something in the configuration settings or docs? I find >> nothing about configuring the backup node to monitor the virtual >> address, just the other node (which has a different IP and kept working >> after the network changes). I am about to set up a service to monitor >> the virtual IP, but I wanted to check with the list, first, to see if >> there's already been something built in that I have not configured >> correctly. I have used main.company.com and backup.company.com as the >> two hostnames of the nodes. Both systems have these names in an >> /etc/hosts file, along with the hostname and IP of the virtual server >> and the ping node. >> >> My configuration: >> >> /etc/ha.d/ha.cf: >> >> debugfile /var/log/ha-debug >> logfile /var/log/ha-log >> logfacility local0 >> keepalive 2 >> deadtime 10 >> warntime 3 >> initdead 120 >> udpport 694 >> baud 9600 >> serial /dev/ttyS0 >> ucast eth1 10.0.0.1 >> ucast eth1 10.0.0.2 >> auto_failback off >> node main.company.com backup.company.com >> ping 129.196.140.130 >> respawn hacluster /usr/lib/heartbeat/ipfail >> deadping 10 >> >> /etc/ha.d/haresources >> >> main.company.com drbddisk::drbd_resource_0 >> Filesystem::/dev/drbd0::/usr0::ext3 mysql IPaddr::129.196.140.14 httpd >> smb MailTo::root >> >> >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
