On 5/21/2013 12:34 AM, Andrew Beekhof wrote: > On 21/05/2013, at 4:19 PM, Nikita Michalko <[email protected]> wrote: > >> >> Am Dienstag, 21. Mai 2013 00:00:03 schrieb DaveW: >>> We are running heartbeat 2.1.3 on CentOS 5.4. Last Monday AM, I >> - Man, so OLD! Any chance to update to the latest version ? > In haresources mode, there is very little difference > >> >> Nikita Michalko >> >>> received a call while getting ready for work. Our high availability >>> server was not responding. The previous Saturday, our I.T. admins had >>> re-configured the network to expand IP address ranges on some subnets. >>> For whatever reason, this action caused our main server (in a two-node >>> HA configuration) to loose its virtual interface, rendering our >>> high-availability server unavailable. This was some kind of human-induced problem. They don't go away on their own. Doing an ifdown/ifup on the main interface would do it. If you're using DHCP (a really bad idea for an HA server) and it issued a new netmask, for the IP then that would probably do it too.
My guess is that someone did the ifdown/ifup to fix the netmask - from what you said that would be necessary. And, that would definitely do it. Pacemaker would have brought it back up again. The haresources configuration doesn't monitor any services - so it doesn't know if they're working or not -- it only monitors servers for up/down status. It does what it does quite well - and it's very simple to set up. But it doesn't do everything that Pacemaker does. >>> >>> The network worked fine; the nodes could ping each other based on their >>> normal IP's and they could ping the ping node, but the virtual IP (the >>> one we REALLY care about) was ignored. Nothing in the logs, no errors, >>> nothing. Just an unresponsive virtual server. A manual fail-over >>> brought it back quickly as the backup took over. I.T. had done their >>> work on Sat and, had I checked our server on Sunday, I would have found >>> it "unreachable" with a normal ping. >>> >>> When my colleague called me, I asked him what "ifconfig" looked like. >>> He described three interfaces; eth0, eth1 and lo; no eth0:0. I had him >>> initiate the manual fail-over. >>> >>> After pouring over the logs, unable to find anything that indicated a >>> problem, I tried to simulate the problem with "ifconfig eth0:0 down". >>> Sure enough, no fail-over, no errors, nothing; just (once again) an >>> unresponsive server. "ifconfig eth0:0 <IP_ADDRESS> up" brought it right >>> back (I tried this last Saturday, BTW, when no one was working). It >>> seems that heartbeat (ipfail?) creates this virtual interface when it >>> starts, then forgets about it. I presume that the assumption is that if >>> eth0 remains intact, eth0:0 will remain intact, as well. >>> >>> Am I missing something in the configuration settings or docs? I find >>> nothing about configuring the backup node to monitor the virtual >>> address, just the other node (which has a different IP and kept working >>> after the network changes). I am about to set up a service to monitor >>> the virtual IP, but I wanted to check with the list, first, to see if >>> there's already been something built in that I have not configured >>> correctly. I have used main.company.com and backup.company.com as the >>> two hostnames of the nodes. Both systems have these names in an >>> /etc/hosts file, along with the hostname and IP of the virtual server >>> and the ping node. >>> >>> My configuration: >>> >>> /etc/ha.d/ha.cf: >>> >>> debugfile /var/log/ha-debug >>> logfile /var/log/ha-log >>> logfacility local0 >>> keepalive 2 >>> deadtime 10 >>> warntime 3 >>> initdead 120 >>> udpport 694 >>> baud 9600 >>> serial /dev/ttyS0 >>> ucast eth1 10.0.0.1 >>> ucast eth1 10.0.0.2 >>> auto_failback off >>> node main.company.com backup.company.com >>> ping 129.196.140.130 >>> respawn hacluster /usr/lib/heartbeat/ipfail >>> deadping 10 >>> >>> /etc/ha.d/haresources >>> >>> main.company.com drbddisk::drbd_resource_0 >>> Filesystem::/dev/drbd0::/usr0::ext3 mysql IPaddr::129.196.140.14 httpd >>> smb MailTo::root >>> >>> >>> >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
