Good day all,


Here is my setup:

- 2 node cluster
- heartbeat v3.0.0-33.2
- RHEL v5.2
- 2 NIC bond  (Issue also happens without bond configured)


ha.cf:
( cluster node host names and ip address substituted )

logfacility     daemon
keepalive 1
deadtime 10
deadping 10
warntime 5
initdead 120
udpport 694
bcast bond0
ping 1.1.1.1
auto_failback off
node    node1
node    node2
respawn hacluster /usr/lib64/heartbeat/ipfail
use_logd no


ha.resources:
( cluster node host names and ip address substituted )

node1 IPaddr::1.1.1.3





My Problem:

- Node1 currently owns the ipaddr resource
- Node1 get disconnected from network
- Node2 starts up resource as expected
- Node1 still holds on to ipaddr resource


Shouldn't node1 release the resource if the ping node (1.1.1.1) is down?





Node1's Log:
( cluster node host names and ip address substituted )

--------------------------------------
Dec  9 18:18:02 node1 ipfail: [17330]: info: Status update: Node
172.20.7.1 now has status dead
Dec  9 18:18:02 node1 heartbeat: [17301]: WARN: node node2: is dead
Dec  9 18:18:02 node1 heartbeat: [17301]: WARN: No STONITH device
configured.
Dec  9 18:18:02 node1 heartbeat: [17301]: WARN: Shared disks are not
protected.
Dec  9 18:18:02 node1 heartbeat: [17301]: info: Resources being acquired
from node2.
Dec  9 18:18:02 node1 heartbeat: [17301]: info: Link 1.1.1.1:1.1.1.1
dead.
Dec  9 18:18:02 node1 heartbeat: [17301]: info: Link node2:bond0 dead.
Dec  9 18:18:02 node1 harc[20177]: info: Running /etc/ha.d/rc.d/status
status
Dec  9 18:18:02 node1 IPaddr[20250]: INFO:  Running OK
Dec  9 18:18:02 node1 heartbeat: [20178]: info: Local Resource
acquisition completed.
Dec  9 18:18:02 node1 ipfail: [17330]: info: NS: We are dead. :<
Dec  9 18:18:02 node1 ipfail: [17330]: info: Status update: Node node2
now has status dead
Dec  9 18:18:02 node1 harc[20279]: info: Running /etc/ha.d/rc.d/status
status
Dec  9 18:18:02 node1 mach_down[20299]:
info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources
acquired
Dec  9 18:18:02 node1 mach_down[20299]: info: mach_down takeover
complete for node node2.
Dec  9 18:18:02 node1 heartbeat: [17301]: info: mach_down takeover
complete.
Dec  9 18:18:03 node1 ipfail: [17330]: info: NS: We are dead. :<
Dec  9 18:18:03 node1 ipfail: [17330]: info: Link Status update: Link
1.1.1.1/1.1.1.1 now has status dead
Dec  9 18:18:04 node1 ipfail: [17330]: info: We are dead. :<
Dec  9 18:18:04 node1 ipfail: [17330]: info: Asking other side for ping
node count.
Dec  9 18:18:04 node1 ipfail: [17330]: info: Link Status update: Link
node2/bond0 now has status dead
Dec  9 18:18:05 node1 ipfail: [17330]: info: We are dead. :<
Dec  9 18:18:05 node1 ipfail: [17330]: info: Asking other side for ping
node count.
-----------------------------------------




Node2's Log:
( cluster node host names and ip address substituted )

----------------------------------------
Dec  9 18:25:54 node2 heartbeat: [17883]: WARN: node node1: is dead
Dec  9 18:25:54 node2 ipfail: [17915]: info: Status update: Node node1
now has status dead
Dec  9 18:25:54 node2 heartbeat: [17883]: WARN: No STONITH device
configured.
Dec  9 18:25:54 node2 heartbeat: [17883]: WARN: Shared disks are not
protected.
Dec  9 18:25:54 node2 heartbeat: [17883]: info: Resources being acquired
from node1.
Dec  9 18:25:54 node2 heartbeat: [17883]: info: Link node1:bond0 dead.
Dec  9 18:25:54 node2 harc[17957]: info: Running /etc/ha.d/rc.d/status
status
Dec  9 18:25:54 node2 heartbeat: [17958]: info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys node2] to acquire.
Dec  9 18:25:54 node2 mach_down[17992]: info: Taking over resource group
IPaddr::1.1.1.3
Dec  9 18:25:54 node2 ResourceManager[18022]: info: Acquiring resource
group: node1 IPaddr::1.1.1.3
Dec  9 18:25:54 node2 IPaddr[18051]: INFO:  Resource is stopped
Dec  9 18:25:54 node2 ResourceManager[18022]: info:
Running /etc/ha.d/resource.d/IPaddr 1.1.1.3 start
Dec  9 18:25:54 node2 IPaddr[18114]: INFO: Using calculated nic for
1.1.1.3: bond0
Dec  9 18:25:54 node2 IPaddr[18114]: INFO: Using calculated netmask for
1.1.1.3: 255.255.255.0
Dec  9 18:25:54 node2 IPaddr[18114]: INFO: eval ifconfig bond0:0 1.1.1.3
netmask 255.255.255.0 broadcast 1.1.1.255
Dec  9 18:25:54 node2 IPaddr[18098]: INFO:  Success
Dec  9 18:25:54 node2 mach_down[17992]:
info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources
acquired
Dec  9 18:25:54 node2 mach_down[17992]: info: mach_down takeover
complete for node node1.
Dec  9 18:25:54 node2 heartbeat: [17883]: info: mach_down takeover
complete.
Dec  9 18:25:55 node2 ipfail: [17915]: info: NS: We are still alive!
Dec  9 18:25:55 node2 ipfail: [17915]: info: Link Status update: Link
node1/bond0 now has status dead
Dec  9 18:25:56 node2 ipfail: [17915]: info: Asking other side for ping
node count.
Dec  9 18:25:56 node2 ipfail: [17915]: info: Checking remote count of
ping nodes.
Dec  9 18:26:04 node2 IPaddr[18114]: ERROR: Could not send gratuitous
arps. rc=1
------------------------------------------



Am I doing something wrong?
Anyone else having this issue?

Any help is much appreciated.

Thanks
-Josh


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to