On 5/21/2013 12:34 AM, Andrew Beekhof wrote:
> On 21/05/2013, at 4:19 PM, Nikita Michalko <[email protected]> wrote:
>
>>
>> Am Dienstag, 21. Mai 2013 00:00:03 schrieb DaveW:
>>> We are running heartbeat 2.1.3 on CentOS 5.4.  Last Monday AM, I
>> - Man, so OLD! Any chance to update to the latest version ?
> In haresources mode, there is very little difference
>
>>
>> Nikita Michalko
>>
>>> received a call while getting ready for work.  Our high availability
>>> server was not responding.  The previous Saturday, our I.T. admins had
>>> re-configured the network to expand IP address ranges on some subnets.
>>> For whatever reason, this action caused our main server (in a two-node
>>> HA configuration) to loose its virtual interface, rendering our
>>> high-availability server unavailable.
This was some kind of human-induced problem.  They don't go away on 
their own.  Doing an ifdown/ifup on the main interface would do it.  If 
you're using DHCP (a really bad idea for an HA server) and it issued a 
new netmask, for the IP then that would probably do it too.

My guess is that someone did the ifdown/ifup to fix the netmask - from 
what you said that would be necessary.  And, that would definitely do it.

Pacemaker would have brought it back up again.  The haresources 
configuration doesn't monitor any services - so it doesn't know if 
they're working or not -- it only monitors servers for up/down status.

It does what it does quite well - and it's very simple to set up.  But 
it doesn't do everything that Pacemaker does.

>>>
>>> The network worked fine; the nodes could ping each other based on their
>>> normal IP's and they could ping the ping node, but the virtual IP (the
>>> one we REALLY care about) was ignored.  Nothing in the logs, no errors,
>>> nothing.   Just an unresponsive virtual server.  A manual fail-over
>>> brought it back quickly as the backup took over.  I.T. had done their
>>> work on Sat and, had I checked our server on Sunday, I would have found
>>> it "unreachable" with a normal ping.
>>>
>>> When my colleague called me, I asked him what "ifconfig" looked like.
>>> He described three interfaces; eth0, eth1 and lo; no eth0:0. I had him
>>> initiate the manual fail-over.
>>>
>>> After pouring over the logs, unable to find anything that indicated a
>>> problem, I tried to simulate the problem with "ifconfig eth0:0 down".
>>> Sure enough, no fail-over, no errors, nothing; just (once again) an
>>> unresponsive server.  "ifconfig eth0:0 <IP_ADDRESS> up" brought it right
>>> back (I tried this last Saturday, BTW, when no one was working).  It
>>> seems that heartbeat (ipfail?) creates this virtual interface when it
>>> starts, then forgets about it.  I presume that the assumption is that if
>>> eth0 remains intact, eth0:0 will remain intact, as well.
>>>
>>> Am I missing something in the configuration settings or docs?  I find
>>> nothing about configuring the backup node to monitor the virtual
>>> address, just the other node (which has a different IP and kept working
>>> after the network changes).  I am about to set up a service to monitor
>>> the virtual IP, but I wanted to check with the list, first, to see if
>>> there's already been something built in that I have not configured
>>> correctly.  I have used main.company.com and backup.company.com as the
>>> two hostnames of the nodes.  Both systems have these names in an
>>> /etc/hosts file, along with the hostname and IP of the virtual server
>>> and the ping node.
>>>
>>> My configuration:
>>>
>>> /etc/ha.d/ha.cf:
>>>
>>> debugfile /var/log/ha-debug
>>> logfile    /var/log/ha-log
>>> logfacility    local0
>>> keepalive 2
>>> deadtime 10
>>> warntime 3
>>> initdead 120
>>> udpport    694
>>> baud    9600
>>> serial    /dev/ttyS0
>>> ucast eth1 10.0.0.1
>>> ucast eth1 10.0.0.2
>>> auto_failback off
>>> node main.company.com backup.company.com
>>> ping 129.196.140.130
>>> respawn hacluster /usr/lib/heartbeat/ipfail
>>> deadping 10
>>>
>>> /etc/ha.d/haresources
>>>
>>> main.company.com drbddisk::drbd_resource_0
>>> Filesystem::/dev/drbd0::/usr0::ext3 mysql IPaddr::129.196.140.14 httpd
>>> smb MailTo::root
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to