I woke up this morning and discovered that one of my clusters had failed
over during the night. Everything was working fine, but I wanted to know
what happened. From reading the logs, it looks to like the primary node
ftp02 gave up its resources to the secondary node ftp01, which had
become overloaded and failed to report in within the deadtime. (ftp01
was busy taking a backup and was experiencing high iowait, which I think
explains why it failed to send heartbeats soon enough.)
 
Why would apparently healthy node ftp02 give its resources to the
overloaded ftp01?
 
Here is the pertinent section of the log from ftp02, which was primary
before the failover:
 
 
 
heartbeat[5391]: 2011/03/03_04:03:20 info: Link
ftp01.mycharts.md:/dev/ttyS0 dead.
heartbeat[5391]: 2011/03/03_04:03:20 info: Link ftp01.mycharts.md:bond0
dead.
ipfail[5775]: 2011/03/03_04:03:20 info: Link Status update: Link
ftp01.mycharts.md//dev/ttyS0 now has status dead
heartbeat[5391]: 2011/03/03_04:03:20 info: Link ftp01.mycharts.md:eth2
dead.
ipfail[5775]: 2011/03/03_04:03:20 debug: Found ping node 192.168.10.100!
ipfail[5775]: 2011/03/03_04:03:20 info: Asking other side for ping node
count.
ipfail[5775]: 2011/03/03_04:03:20 debug: Message [num_ping] sent.
ipfail[5775]: 2011/03/03_04:03:20 info: Checking remote count of ping
nodes.
ipfail[5775]: 2011/03/03_04:03:20 info: Link Status update: Link
ftp01.mycharts.md/bond0 now has status dead
ipfail[5775]: 2011/03/03_04:03:21 debug: Found ping node 192.168.10.100!
heartbeat[5391]: 2011/03/03_04:03:21 WARN: node ftp01.mycharts.md: is
dead
heartbeat[5391]: 2011/03/03_04:03:21 info: Dead node ftp01.mycharts.md
gave up resources.
heartbeat[5391]: 2011/03/03_04:03:21 debug:
hb_rsc_recover_dead_resources: other now stable
ipfail[5775]: 2011/03/03_04:03:21 info: Asking other side for ping node
count.
ipfail[5775]: 2011/03/03_04:03:21 debug: Message [num_ping] sent.
ipfail[5775]: 2011/03/03_04:03:21 info: Checking remote count of ping
nodes.
ipfail[5775]: 2011/03/03_04:03:21 info: Link Status update: Link
ftp01.mycharts.md/eth2 now has status dead
ipfail[5775]: 2011/03/03_04:03:21 debug: Found ping node 192.168.10.100!
ipfail[5775]: 2011/03/03_04:03:21 info: Asking other side for ping node
count.
ipfail[5775]: 2011/03/03_04:03:21 debug: Message [num_ping] sent.
ipfail[5775]: 2011/03/03_04:03:21 info: Checking remote count of ping
nodes.
ipfail[5775]: 2011/03/03_04:03:21 info: Status update: Node
ftp01.mycharts.md now has status dead
ipfail[5775]: 2011/03/03_04:03:22 debug: Found ping node 192.168.10.100!
ipfail[5775]: 2011/03/03_04:03:22 info: NS: We are still alive!
heartbeat[5391]: 2011/03/03_04:03:38 CRIT: Cluster node
ftp01.mycharts.md returning after partition.
heartbeat[5391]: 2011/03/03_04:03:38 info: For information on cluster
partitions, See URL: http://linux-ha.org/SplitBrain
<http://linux-ha.org/SplitBrain> 
heartbeat[5391]: 2011/03/03_04:03:38 WARN: Deadtime value may be too
small.
heartbeat[5391]: 2011/03/03_04:03:38 info: See FAQ for information on
tuning deadtime.
heartbeat[5391]: 2011/03/03_04:03:38 info: URL:
http://linux-ha.org/FAQ#heavy_load <http://linux-ha.org/FAQ#heavy_load> 
heartbeat[5391]: 2011/03/03_04:03:38 info: Link
ftp01.mycharts.md:/dev/ttyS0 up.
heartbeat[5391]: 2011/03/03_04:03:38 WARN: Late heartbeat: Node
ftp01.mycharts.md: interval 49030 ms
heartbeat[5391]: 2011/03/03_04:03:38 info: Status update for node
ftp01.mycharts.md: status active
ipfail[5775]: 2011/03/03_04:03:38 info: Link Status update: Link
ftp01.mycharts.md//dev/ttyS0 now has status up
heartbeat[5391]: 2011/03/03_04:03:38 debug: Status seqno: 568925
msgtime: 1299153818
heartbeat[5391]: 2011/03/03_04:03:38 debug: StartNextRemoteRscReq() -
calling hook
heartbeat[5391]: 2011/03/03_04:03:38 debug: notify_world: invoking harc:
OLD status: active
heartbeat[5391]: 2011/03/03_04:03:38 debug: Process [status] started pid
16973
heartbeat[5391]: 2011/03/03_04:03:38 debug: Starting notify process
[status]
ipfail[5775]: 2011/03/03_04:03:38 info: Status update: Node
ftp01.mycharts.md now has status active
heartbeat[5391]: 2011/03/03_04:03:38 info: Link ftp01.mycharts.md:bond0
up.
ipfail[5775]: 2011/03/03_04:03:38 info: Link Status update: Link
ftp01.mycharts.md/bond0 now has status up
heartbeat[5391]: 2011/03/03_04:03:38 info: Link ftp01.mycharts.md:eth2
up.
heartbeat[16973]: 2011/03/03_04:03:38 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
ipfail[5775]: 2011/03/03_04:03:38 info: Link Status update: Link
ftp01.mycharts.md/eth2 now has status up
heartbeat[16973]: 2011/03/03_04:03:38 debug: notify_world: Running harc
status
harc[16973]: 2011/03/03_04:03:38 info: Running /etc/ha.d/rc.d/status
status
heartbeat[5391]: 2011/03/03_04:03:38 info: Managed status process 16973
exited with return code 0.
heartbeat[5391]: 2011/03/03_04:03:38 debug: RscMgmtProc 'status' exited
code 0
heartbeat[5391]: 2011/03/03_04:03:40 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[5391]: 2011/03/03_04:03:40 info: hb_giveup_resources():
current status: active
heartbeat[5391]: 2011/03/03_04:03:40 debug: Sending hold resources msg:
none, stable=0 # shutdown
heartbeat[5391]: 2011/03/03_04:03:40 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat[5391]: 2011/03/03_04:03:40 info: Heartbeat shutdown in
progress. (5391)
heartbeat[5391]: 2011/03/03_04:03:40 debug: Process
[hb_giveup_resources] started pid 17023
heartbeat[17023]: 2011/03/03_04:03:40 info: Giving up all HA resources.
ResourceManager[17036]: 2011/03/03_04:03:40 info: Releasing resource
group: ftp01.mycharts.md drbddisk::ha_ftp
Filesystem::/dev/drbd0::/ha_ftp::ext3::noatime
IPaddr2::192.168.10.203/24/bond0 pure-ftpd nfs
ResourceManager[17036]: 2011/03/03_04:03:40 info: Running
/etc/init.d/nfs  stop
ResourceManager[17036]: 2011/03/03_04:03:40 debug: Starting
/etc/init.d/nfs  stop
Shutting down NFS mountd: [  OK  ]
Shutting down NFS daemon: [  OK  ]
Shutting down NFS quotas: [  OK  ]
Shutting down NFS services:  [  OK  ]
ResourceManager[17036]: 2011/03/03_04:03:41 debug: /etc/init.d/nfs  stop
done. RC=0
ResourceManager[17036]: 2011/03/03_04:03:41 info: Running
/etc/init.d/pure-ftpd  stop
ResourceManager[17036]: 2011/03/03_04:03:41 debug: Starting
/etc/init.d/pure-ftpd  stop
Stopping pure-ftpd: [  OK  ]
ResourceManager[17036]: 2011/03/03_04:03:41 debug: /etc/init.d/pure-ftpd
stop done. RC=0
ResourceManager[17036]: 2011/03/03_04:03:41 info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.10.203/24/bond0 stop
ResourceManager[17036]: 2011/03/03_04:03:41 debug: Starting
/etc/ha.d/resource.d/IPaddr2 192.168.10.203/24/bond0 stop
IPaddr2[17196]: 2011/03/03_04:03:41 INFO: ip -f inet addr delete
192.168.10.203/24 dev bond0
IPaddr2[17196]: 2011/03/03_04:03:41 INFO: ip -o -f inet addr show bond0
IPaddr2[17167]: 2011/03/03_04:03:41 INFO:  Success
INFO:  Success
ResourceManager[17036]: 2011/03/03_04:03:41 debug:
/etc/ha.d/resource.d/IPaddr2 192.168.10.203/24/bond0 stop done. RC=0
ResourceManager[17036]: 2011/03/03_04:03:41 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_ftp ext3 noatime stop
ResourceManager[17036]: 2011/03/03_04:03:41 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_ftp ext3 noatime stop
Filesystem[17277]: 2011/03/03_04:03:41 INFO: Running stop for /dev/drbd0
on /ha_ftp
Filesystem[17277]: 2011/03/03_04:03:41 INFO: Trying to unmount /ha_ftp
umount: /ha_ftp: device is busy
umount: /ha_ftp: device is busy
Filesystem[17277]: 2011/03/03_04:03:41 ERROR: Couldn't unmount /ha_ftp;
trying cleanup with SIGTERM
/ha_ftp:              8796c
Filesystem[17277]: 2011/03/03_04:03:42 INFO: Some processes on /ha_ftp
were signalled
ipfail[5775]: 2011/03/03_04:03:45 info: No giveup timer to abort.
ipfail[5775]: 2011/03/03_04:03:47 info: No giveup timer to abort.
ipfail[5775]: 2011/03/03_04:03:48 info: No giveup timer to abort.
Filesystem[17277]: 2011/03/03_04:03:54 INFO: unmounted /ha_ftp
successfully
Filesystem[17266]: 2011/03/03_04:03:54 INFO:  Success
INFO:  Success
ResourceManager[17036]: 2011/03/03_04:03:54 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_ftp ext3 noatime stop
done. RC=0
ResourceManager[17036]: 2011/03/03_04:03:54 info: Running
/etc/ha.d/resource.d/drbddisk ha_ftp stop
ResourceManager[17036]: 2011/03/03_04:03:54 debug: Starting
/etc/ha.d/resource.d/drbddisk ha_ftp stop
ResourceManager[17036]: 2011/03/03_04:03:54 debug:
/etc/ha.d/resource.d/drbddisk ha_ftp stop done. RC=0
heartbeat[17023]: 2011/03/03_04:03:54 info: All HA resources
relinquished.
heartbeat[17023]: 2011/03/03_04:03:54 debug: Sending T_SHUTDONE.
heartbeat[17023]: 2011/03/03_04:03:54 info: FIFO message [type shutdone]
written rc=27
heartbeat[5391]: 2011/03/03_04:03:54 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat[5391]: 2011/03/03_04:03:54 debug: Received T_SHUTDONE from us.
heartbeat[5391]: 2011/03/03_04:03:54 debug: Calling
hb_mcp_final_shutdown in a second.
heartbeat[5391]: 2011/03/03_04:03:54 debug: RscMgmtProc
'hb_giveup_resources' exited code 0
heartbeat[5391]: 2011/03/03_04:03:55 debug: hb_mcp_final_shutdown()
phase 0
heartbeat[5391]: 2011/03/03_04:03:55 debug: Shutting down client
/usr/lib/heartbeat/ipfail
heartbeat[5391]: 2011/03/03_04:03:55 info: killing
/usr/lib/heartbeat/ipfail process group 5775 with signal 15
heartbeat[5391]: 2011/03/03_04:03:55 info: other_holds_resources: 3
heartbeat[5391]: 2011/03/03_04:03:55 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat[5391]: 2011/03/03_04:03:56 debug: G_remove_client(pid=5775,
reason='died' gsource=0x9e4b348) {
heartbeat[5391]: 2011/03/03_04:03:56 debug: api_remove_client_int:
removing pid [5775] reason: died
heartbeat[5391]: 2011/03/03_04:03:56 debug: api_send_client: client 5775
died
heartbeat[5391]: 2011/03/03_04:03:56 debug: }/*G_remove_client;*/
heartbeat[5391]: 2011/03/03_04:03:56 debug: Final client
"/usr/lib/heartbeat/ipfail" died.
heartbeat[5391]: 2011/03/03_04:03:56 debug: hb_mcp_final_shutdown()
phase 1
heartbeat[5391]: 2011/03/03_04:03:57 debug: hb_mcp_final_shutdown()
phase 2
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBREAD process 5500
with signal 15
heartbeat[5497]: 2011/03/03_04:03:57 debug: Process 5497 processing
SIGTERM
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBWRITE process 5501
with signal 15
heartbeat[5497]: 2011/03/03_04:03:57 debug: Exiting from pid 5497
[rc=15]
heartbeat[5503]: 2011/03/03_04:03:57 debug: Process 5503 processing
SIGTERM
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBREAD process 5502
with signal 15
heartbeat[5503]: 2011/03/03_04:03:57 debug: Exiting from pid 5503
[rc=15]
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBWRITE process 5503
with signal 15
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBREAD process 5504
with signal 15
heartbeat[5501]: 2011/03/03_04:03:57 debug: Process 5501 processing
SIGTERM
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBWRITE process 5494
with signal 15
heartbeat[5504]: 2011/03/03_04:03:57 debug: Process 5504 processing
SIGTERM
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBREAD process 5495
with signal 15
heartbeat[5501]: 2011/03/03_04:03:57 debug: Exiting from pid 5501
[rc=15]
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBFIFO process 5419
with signal 15
heartbeat[5419]: 2011/03/03_04:03:57 debug: Process 5419 processing
SIGTERM
heartbeat[5502]: 2011/03/03_04:03:57 debug: Process 5502 processing
SIGTERM
heartbeat[5502]: 2011/03/03_04:03:57 debug: Exiting from pid 5502
[rc=15]
heartbeat[5500]: 2011/03/03_04:03:57 debug: Process 5500 processing
SIGTERM
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBWRITE process 5497
with signal 15
heartbeat[5494]: 2011/03/03_04:03:57 debug: Process 5494 processing
SIGTERM
heartbeat[5494]: 2011/03/03_04:03:57 debug: Calling localdie() function
heartbeat[5495]: 2011/03/03_04:03:57 debug: Process 5495 processing
SIGTERM
heartbeat[5495]: 2011/03/03_04:03:57 debug: Exiting from pid 5495
[rc=15]
heartbeat[5499]: 2011/03/03_04:03:57 debug: Process 5499 processing
SIGTERM
heartbeat[5498]: 2011/03/03_04:03:57 debug: Process 5498 processing
SIGTERM
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBREAD process 5498
with signal 15
heartbeat[5494]: 2011/03/03_04:03:57 debug: glib: serial_localdie:
Flushing tty
heartbeat[5499]: 2011/03/03_04:03:57 debug: Exiting from pid 5499
[rc=15]
heartbeat[5494]: 2011/03/03_04:03:57 debug: Exiting from pid 5494
[rc=15]
heartbeat[5498]: 2011/03/03_04:03:57 debug: Exiting from pid 5498
[rc=15]
heartbeat[5391]: 2011/03/03_04:03:57 info: killing HBWRITE process 5499
with signal 15
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5494 exited. 11
remaining
heartbeat[5419]: 2011/03/03_04:03:57 debug: Exiting from pid 5419
[rc=15]
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5495 exited. 10
remaining
heartbeat[5504]: 2011/03/03_04:03:57 debug: Exiting from pid 5504
[rc=15]
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5497 exited. 9
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5498 exited. 8
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5499 exited. 7
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5501 exited. 6
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5502 exited. 5
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5503 exited. 4
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5419 exited. 3
remaining
heartbeat[5500]: 2011/03/03_04:03:57 debug: Exiting from pid 5500
[rc=15]
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5504 exited. 2
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: Core process 5500 exited. 1
remaining
heartbeat[5391]: 2011/03/03_04:03:57 info: ftp02.mycharts.md Heartbeat
shutdown complete.
heartbeat[5391]: 2011/03/03_04:03:57 info: Heartbeat restart triggered.
heartbeat[5391]: 2011/03/03_04:03:57 info: Restarting heartbeat.
heartbeat[5391]: 2011/03/03_04:03:57 info: Performing heartbeat restart
exec.
heartbeat[5391]: 2011/03/03_04:04:28 debug: uid=hacluster, gid=haclient
uid=hacluster
heartbeat[5391]: 2011/03/03_04:04:28 debug: uid=hacluster, gid=<null>
heartbeat[5391]: 2011/03/03_04:04:28 debug: uid=<null>, gid=haclient
heartbeat[5391]: 2011/03/03_04:04:28 debug: uid=root, gid=<null>
heartbeat[5391]: 2011/03/03_04:04:28 debug: uid=<null>, gid=haclient
heartbeat[5391]: 2011/03/03_04:04:28 debug: Beginning authentication
parsing
heartbeat[5391]: 2011/03/03_04:04:28 debug: 16 max authentication
methods
heartbeat[5391]: 2011/03/03_04:04:28 debug: Keyfile opened
heartbeat[5391]: 2011/03/03_04:04:28 debug: Keyfile perms OK
heartbeat[5391]: 2011/03/03_04:04:28 debug: 16 max authentication
methods
heartbeat[5391]: 2011/03/03_04:04:28 debug: Found authentication method
[sha1]
heartbeat[5391]: 2011/03/03_04:04:28 info: AUTH: i=2: key = 0x1412cf48,
auth=0x2b4e4e66a610, authname=sha1
heartbeat[5391]: 2011/03/03_04:04:28 debug: Outbound signing method is 2
heartbeat[5391]: 2011/03/03_04:04:28 debug: Authentication parsing
complete [1]
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(cluster,linux-ha)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(hopfudge,1)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(hbgenmethod,file)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(realtime,true)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(msgfmt,classic)
heartbeat[5391]: 2011/03/03_04:04:28 debug:
add_option(conn_logd_time,60)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(log_badpack,true)
heartbeat[5391]: 2011/03/03_04:04:28 debug:
add_option(syslogmsgfmt,false)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(coredumps,true)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(crm,false)
heartbeat[5391]: 2011/03/03_04:04:28 info: Version 2 support: false
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(autojoin,none)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(uuidfrom,file)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(compression,zlib)
heartbeat[5391]: 2011/03/03_04:04:28 debug:
add_option(compression_threshold,2)
heartbeat[5391]: 2011/03/03_04:04:28 debug:
add_option(max_rexmit_delay,250)
heartbeat[5391]: 2011/03/03_04:04:28 debug: Setting max_rexmit_delay to
250 ms
heartbeat[5391]: 2011/03/03_04:04:28 debug:
add_option(record_config_changes,on)
heartbeat[5391]: 2011/03/03_04:04:28 debug:
add_option(record_pengine_inputs,on)
heartbeat[5391]: 2011/03/03_04:04:28 debug:
add_option(enable_config_writes,on)
heartbeat[5391]: 2011/03/03_04:04:28 debug: add_option(memreserve,6500)
heartbeat[5391]: 2011/03/03_04:04:28 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[5391]: 2011/03/03_04:04:28 info: **************************
heartbeat[5391]: 2011/03/03_04:04:28 info: Configuration validated.
Starting heartbeat 2.1.3
heartbeat[5391]: 2011/03/03_04:04:28 debug: HA configuration OK.
Heartbeat starting.
heartbeat[18456]: 2011/03/03_04:04:28 info: heartbeat: version 2.1.3
heartbeat[18456]: 2011/03/03_04:04:28 info: Heartbeat generation:
1248828643
heartbeat[18456]: 2011/03/03_04:04:28 debug: uuid
is:4ae573d0-2ccd-4f5b-b769-ddcb47da58c9
heartbeat[18456]: 2011/03/03_04:04:28 debug: FIFO process pid: 18458
heartbeat[18456]: 2011/03/03_04:04:28 debug: opening serial /dev/ttyS0
(serial ring)
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: tty setup on
/dev/ttyS0 complete.
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: Baud rate set to: 0xe
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: ti.c_iflag = 0xb2
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: ti.c_oflag = 0x4
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: ti.c_cflag =
0x80000cbe
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: ti.c_lflag = 0x8a30
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: Starting serial
heartbeat on tty /dev/ttyS0 (19200 baud)
heartbeat[18456]: 2011/03/03_04:04:28 debug: write process pid: 18466
heartbeat[18456]: 2011/03/03_04:04:28 debug: read child process pid:
18467
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair wchan socket 9
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair rchan socket 11
heartbeat[18456]: 2011/03/03_04:04:28 debug: opening bcast bond0 (UDP/IP
broadcast)
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: SO_BINDTODEVICE(r)
set for device bond0
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: UDP Broadcast
heartbeat started on port 698 (698) interface bond0
heartbeat[18456]: 2011/03/03_04:04:28 debug: write process pid: 18468
heartbeat[18456]: 2011/03/03_04:04:28 debug: read child process pid:
18469
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: UDP Broadcast
heartbeat closed on port 698 interface bond0 - Status: 1
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair wchan socket 12
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair rchan socket 14
heartbeat[18456]: 2011/03/03_04:04:28 debug: opening bcast eth2 (UDP/IP
broadcast)
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: SO_BINDTODEVICE(r)
set for device eth2
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: UDP Broadcast
heartbeat started on port 698 (698) interface eth2
heartbeat[18456]: 2011/03/03_04:04:28 debug: write process pid: 18470
heartbeat[18456]: 2011/03/03_04:04:28 debug: read child process pid:
18471
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: UDP Broadcast
heartbeat closed on port 698 interface eth2 - Status: 1
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair wchan socket 13
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair rchan socket 16
heartbeat[18456]: 2011/03/03_04:04:28 debug: opening bcast eth3 (UDP/IP
broadcast)
heartbeat[18456]: 2011/03/03_04:04:28 debug: glib: SO_BINDTODEVICE(r)
set for device eth3
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: UDP Broadcast
heartbeat started on port 698 (698) interface eth3
heartbeat[18456]: 2011/03/03_04:04:28 debug: write process pid: 18472
heartbeat[18456]: 2011/03/03_04:04:28 debug: read child process pid:
18473
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: UDP Broadcast
heartbeat closed on port 698 interface eth3 - Status: 1
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair wchan socket 15
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair rchan socket 18
heartbeat[18456]: 2011/03/03_04:04:28 debug: opening ping 192.168.10.100
(ping membership)
heartbeat[18456]: 2011/03/03_04:04:28 info: glib: ping heartbeat
started.
heartbeat[18456]: 2011/03/03_04:04:28 debug: write process pid: 18474
heartbeat[18456]: 2011/03/03_04:04:28 debug: read child process pid:
18475
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair wchan socket 10
heartbeat[18456]: 2011/03/03_04:04:28 debug: make_io_childpair: CREATED
childpair rchan socket 19
heartbeat[18456]: 2011/03/03_04:04:28 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[18456]: 2011/03/03_04:04:28 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[18456]: 2011/03/03_04:04:28 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[18456]: 2011/03/03_04:04:28 debug: Limiting CPU: 42 CPU
seconds every 60000 milliseconds
heartbeat[18456]: 2011/03/03_04:04:28 debug: pid 18456 locked in memory.
heartbeat[18456]: 2011/03/03_04:04:28 debug: Waiting for child processes
to start
heartbeat[18456]: 2011/03/03_04:04:28 info: Local status now set to:
'up'
heartbeat[18456]: 2011/03/03_04:04:28 debug: All your child process are
belong to us
heartbeat[18456]: 2011/03/03_04:04:28 debug: Starting local status
message @ 2000 ms intervals
heartbeat[18456]: 2011/03/03_04:04:28 debug: Forking temp process
write_hostcachedata
heartbeat[18456]: 2011/03/03_04:04:28 info: Managed write_hostcachedata
process 18476 exited with return code 0.
heartbeat[18458]: 2011/03/03_04:04:29 debug: pid 18458 locked in memory.
heartbeat[18458]: 2011/03/03_04:04:29 debug: Limiting CPU: 6 CPU seconds
every 60000 milliseconds
heartbeat[18466]: 2011/03/03_04:04:29 debug: pid 18466 locked in memory.
heartbeat[18466]: 2011/03/03_04:04:29 debug: Limiting CPU: 24 CPU
seconds every 60000 milliseconds
heartbeat[18467]: 2011/03/03_04:04:29 debug: pid 18467 locked in memory.
heartbeat[18467]: 2011/03/03_04:04:29 debug: Limiting CPU: 6 CPU seconds
every 60000 milliseconds
heartbeat[18469]: 2011/03/03_04:04:29 debug: pid 18469 locked in memory.
heartbeat[18468]: 2011/03/03_04:04:29 debug: pid 18468 locked in memory.
heartbeat[18469]: 2011/03/03_04:04:29 debug: Limiting CPU: 6 CPU seconds
every 60000 milliseconds
heartbeat[18468]: 2011/03/03_04:04:29 debug: Limiting CPU: 24 CPU
seconds every 60000 milliseconds
heartbeat[18471]: 2011/03/03_04:04:29 debug: pid 18471 locked in memory.
heartbeat[18471]: 2011/03/03_04:04:29 debug: Limiting CPU: 6 CPU seconds
every 60000 milliseconds
heartbeat[18472]: 2011/03/03_04:04:29 debug: pid 18472 locked in memory.
heartbeat[18472]: 2011/03/03_04:04:29 debug: Limiting CPU: 24 CPU
seconds every 60000 milliseconds
heartbeat[18470]: 2011/03/03_04:04:29 debug: pid 18470 locked in memory.
heartbeat[18470]: 2011/03/03_04:04:29 debug: Limiting CPU: 24 CPU
seconds every 60000 milliseconds
heartbeat[18475]: 2011/03/03_04:04:29 debug: pid 18475 locked in memory.
heartbeat[18475]: 2011/03/03_04:04:29 debug: Limiting CPU: 6 CPU seconds
every 60000 milliseconds
heartbeat[18473]: 2011/03/03_04:04:29 debug: pid 18473 locked in memory.
heartbeat[18473]: 2011/03/03_04:04:29 debug: Limiting CPU: 6 CPU seconds
every 60000 milliseconds
heartbeat[18474]: 2011/03/03_04:04:29 debug: pid 18474 locked in memory.
heartbeat[18474]: 2011/03/03_04:04:29 debug: Limiting CPU: 24 CPU
seconds every 60000 milliseconds
heartbeat[18456]: 2011/03/03_04:04:29 info: Link
192.168.10.100:192.168.10.100 up.
heartbeat[18456]: 2011/03/03_04:04:29 debug: CreateInitialFilter: status
heartbeat[18456]: 2011/03/03_04:04:29 debug: CreateInitialFilter:
ask_resources
heartbeat[18456]: 2011/03/03_04:04:29 debug: CreateInitialFilter:
ip-request
heartbeat[18456]: 2011/03/03_04:04:29 debug: CreateInitialFilter:
ip-request-resp
heartbeat[18456]: 2011/03/03_04:04:29 debug: CreateInitialFilter:
hb_takeover
heartbeat[18456]: 2011/03/03_04:04:29 info: Status update for node
192.168.10.100: status ping
heartbeat[18456]: 2011/03/03_04:04:29 debug: Status seqno: 0 msgtime:
1299153868
heartbeat[18456]: 2011/03/03_04:04:29 info: Link ftp02.mycharts.md:bond0
up.
heartbeat[18456]: 2011/03/03_04:04:29 info: Link ftp02.mycharts.md:eth2
up.
heartbeat[18456]: 2011/03/03_04:04:29 info: Link ftp02.mycharts.md:eth3
up.
heartbeat[18456]: 2011/03/03_04:04:30 WARN: string2msg_ll: node
[ftp01.mycharts.md] failed authentication
heartbeat[18456]: 2011/03/03_04:04:30 info: Link ftp01.mycharts.md:bond0
up.
heartbeat[18456]: 2011/03/03_04:04:30 debug: sending reqnodes msg to
node ftp01.mycharts.md
heartbeat[18456]: 2011/03/03_04:04:30 info: Link ftp01.mycharts.md:eth2
up.
heartbeat[18456]: 2011/03/03_04:04:30 info: Link
ftp01.mycharts.md:/dev/ttyS0 up.
heartbeat[18456]: 2011/03/03_04:04:30 debug: Forking temp process
write_hostcachedata
heartbeat[18456]: 2011/03/03_04:04:30 info: Managed write_hostcachedata
process 18512 exited with return code 0.
heartbeat[18456]: 2011/03/03_04:04:30 info: Status update for node
ftp01.mycharts.md: status active
heartbeat[18456]: 2011/03/03_04:04:30 debug: Status seqno: 568961
msgtime: 1299153870
heartbeat[18456]: 2011/03/03_04:04:30 debug: StartNextRemoteRscReq() -
calling hook
heartbeat[18456]: 2011/03/03_04:04:30 debug: notify_world: invoking
harc: OLD status: up
heartbeat[18456]: 2011/03/03_04:04:30 debug: Process [status] started
pid 18540
heartbeat[18456]: 2011/03/03_04:04:30 debug: Starting notify process
[status]
heartbeat[18456]: 2011/03/03_04:04:30 debug: Get a repnodes msg from
ftp01.mycharts.md
heartbeat[18456]: 2011/03/03_04:04:30 debug: nodelist
received:ftp01.mycharts.md ftp02.mycharts.md 
heartbeat[18456]: 2011/03/03_04:04:30 info: Comm_now_up(): updating
status to active
heartbeat[18456]: 2011/03/03_04:04:30 info: Local status now set to:
'active'
heartbeat[18456]: 2011/03/03_04:04:30 debug: Sending local starting msg:
resourcestate = 0
heartbeat[18456]: 2011/03/03_04:04:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 0
heartbeat[18456]: 2011/03/03_04:04:30 info: Starting child client
"/usr/lib/heartbeat/ipfail" (498,496)
heartbeat[18456]: 2011/03/03_04:04:30 debug: Forking temp process
write_hostcachedata
heartbeat[18541]: 2011/03/03_04:04:30 info: Starting
"/usr/lib/heartbeat/ipfail" as uid 498  gid 496 (pid 18541)
heartbeat[18456]: 2011/03/03_04:04:30 debug: Forking temp process
write_delcachedata
heartbeat[18540]: 2011/03/03_04:04:30 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[18540]: 2011/03/03_04:04:30 debug: notify_world: Running harc
status
heartbeat[18456]: 2011/03/03_04:04:30 info: Managed write_hostcachedata
process 18542 exited with return code 0.
heartbeat[18456]: 2011/03/03_04:04:30 info: Managed write_delcachedata
process 18543 exited with return code 0.
ipfail[18541]: 2011/03/03_04:04:30 debug: PID=18541
ipfail[18541]: 2011/03/03_04:04:30 debug: Signing in with heartbeat
heartbeat[18456]: 2011/03/03_04:04:30 debug: APIregistration_dispatch()
{
heartbeat[18456]: 2011/03/03_04:04:30 debug: process_registerevent() {
heartbeat[18456]: 2011/03/03_04:04:30 debug: client->gsource =
0x14151c28
heartbeat[18456]: 2011/03/03_04:04:30 debug: }/*process_registerevent*/;
heartbeat[18456]: 2011/03/03_04:04:30 debug:
}/*APIregistration_dispatch*/;
heartbeat[18456]: 2011/03/03_04:04:30 debug: Checking client
authorization for client ipfail (498:496)
heartbeat[18456]: 2011/03/03_04:04:30 debug:
create_seq_snapshot_table:no missing packets found for node
ftp01.mycharts.md
heartbeat[18456]: 2011/03/03_04:04:30 debug:
create_seq_snapshot_table:no missing packets found for node
ftp02.mycharts.md
heartbeat[18456]: 2011/03/03_04:04:30 debug: Signing on API client 18541
(ipfail)
harc[18540]: 2011/03/03_04:04:30 info: Running /etc/ha.d/rc.d/status
status
heartbeat[18456]: 2011/03/03_04:04:30 info: Managed status process 18540
exited with return code 0.
heartbeat[18456]: 2011/03/03_04:04:30 debug: RscMgmtProc 'status' exited
code 0
heartbeat[18456]: 2011/03/03_04:04:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 1
ipfail[18541]: 2011/03/03_04:04:30 debug: [We are ftp02.mycharts.md]
heartbeat[18456]: 2011/03/03_04:04:31 info: AnnounceTakeover(local 0,
foreign 1, reason 'T_RESOURCES' (0))
heartbeat[18456]: 2011/03/03_04:04:31 info: remote resource transition
completed.
heartbeat[18456]: 2011/03/03_04:04:31 debug: Sending hold resources msg:
none, stable=0 # <none>
heartbeat[18456]: 2011/03/03_04:04:31 info: AnnounceTakeover(local 0,
foreign 1, reason 'T_RESOURCES' (0))
heartbeat[18456]: 2011/03/03_04:04:31 info: STATE 1 => 3
heartbeat[18456]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 3
heartbeat[18456]: 2011/03/03_04:04:31 debug: Calling
PerformAutoFailback()
heartbeat[18456]: 2011/03/03_04:04:31 info: other_holds_resources: 3
heartbeat[18456]: 2011/03/03_04:04:31 info: remote resource transition
completed.
heartbeat[18456]: 2011/03/03_04:04:31 info: Local Resource acquisition
completed. (none)
heartbeat[18456]: 2011/03/03_04:04:31 debug: Sending hold resources msg:
none, stable=1 # <none>
heartbeat[18456]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 3
heartbeat[18456]: 2011/03/03_04:04:31 debug: Calling
PerformAutoFailback()
heartbeat[18456]: 2011/03/03_04:04:31 info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(them)' (0))
heartbeat[18456]: 2011/03/03_04:04:31 info: Initial resource acquisition
complete (T_RESOURCES(them))
heartbeat[18456]: 2011/03/03_04:04:31 info: STATE 3 => 4
heartbeat[18456]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[18541]: 2011/03/03_04:04:31 debug: auto_failback -> 0 (off)
ipfail[18541]: 2011/03/03_04:04:31 debug: Setting message filter mode
heartbeat[18456]: 2011/03/03_04:04:31 info: other_holds_resources: 3
heartbeat[18456]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[18541]: 2011/03/03_04:04:31 debug: Starting node walk
ipfail[18541]: 2011/03/03_04:04:32 debug: Cluster node: 192.168.10.100:
status: ping
ipfail[18541]: 2011/03/03_04:04:32 debug: Cluster node:
ftp02.mycharts.md: status: active
ipfail[18541]: 2011/03/03_04:04:33 debug: Cluster node:
ftp01.mycharts.md: status: active
ipfail[18541]: 2011/03/03_04:04:33 debug: [They are ftp01.mycharts.md]
ipfail[18541]: 2011/03/03_04:04:33 debug: Setting message signal
ipfail[18541]: 2011/03/03_04:04:33 debug: Waiting for messages...
ipfail[18541]: 2011/03/03_04:04:34 debug: Other side is now stable.
ipfail[18541]: 2011/03/03_04:04:34 debug: Other side is now stable.
ipfail[18541]: 2011/03/03_04:04:34 debug: Got asked for num_ping.
ipfail[18541]: 2011/03/03_04:04:35 debug: Found ping node
192.168.10.100!
ipfail[18541]: 2011/03/03_04:04:36 info: Ping node count is balanced.
ipfail[18541]: 2011/03/03_04:04:36 debug: Abort message sent.
heartbeat[18456]: 2011/03/03_04:14:29 info: Daily informational memory
statistics
 
 
//////////////////////////////////////////////////////////////
 
 
Now here is the ha-debug from server ftp01, which was secondary before
the failover:
 
 
 
 
heartbeat[5280]: 2011/03/03_04:03:38 WARN: Gmain_timeout_dispatch:
Dispatch function for memory stats took too long to execute: 48880 ms (>
100 ms) (GSource: 0xb29108)
heartbeat[5280]: 2011/03/03_04:03:38 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 47980 ms (> 1010 ms)
before being called (GSource: 0xb28398)
heartbeat[5280]: 2011/03/03_04:03:40 info: Gmain_timeout_dispatch:
started at 542452156 should have started at 542447358
heartbeat[5280]: 2011/03/03_04:03:41 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status took too long to execute: 2480
ms (> 50 ms) (GSource: 0xb28398)
heartbeat[5280]: 2011/03/03_04:03:44 WARN: Late heartbeat: Node
ftp02.mycharts.md: interval 53380 ms
heartbeat[5280]: 2011/03/03_04:03:44 WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 3280 ms (> 50 ms)
(GSource: 0xb1c388)
heartbeat[5280]: 2011/03/03_04:03:44 WARN: G_CH_dispatch_int: Dispatch
function for read child was delayed 3290 ms (> 500 ms) before being
called (GSource: 0xb1ee98)
heartbeat[5280]: 2011/03/03_04:03:44 info: G_CH_dispatch_int: started at
542452778 should have started at 542452449
heartbeat[5280]: 2011/03/03_04:03:44 WARN: G_CH_dispatch_int: Dispatch
function for read child was delayed 3290 ms (> 500 ms) before being
called (GSource: 0xb1f7d8)
heartbeat[5280]: 2011/03/03_04:03:44 info: G_CH_dispatch_int: started at
542452778 should have started at 542452449
heartbeat[5280]: 2011/03/03_04:03:44 WARN: G_CH_dispatch_int: Dispatch
function for read child was delayed 3280 ms (> 500 ms) before being
called (GSource: 0xb20528)
heartbeat[5280]: 2011/03/03_04:03:44 info: G_CH_dispatch_int: started at
542452778 should have started at 542452450
heartbeat[5280]: 2011/03/03_04:03:44 WARN: Late heartbeat: Node
192.168.10.100: interval 56190 ms
heartbeat[5280]: 2011/03/03_04:03:44 WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 1740 ms (> 1010 ms)
before being called (GSource: 0xb28398)
heartbeat[5280]: 2011/03/03_04:03:44 info: Gmain_timeout_dispatch:
started at 542452778 should have started at 542452604
ipfail[5456]: 2011/03/03_04:03:44 debug: Got asked for num_ping.
heartbeat[5280]: 2011/03/03_04:03:44 debug: process_resources(2):  other
now unstable
heartbeat[5280]: 2011/03/03_04:03:44 info: other_holds_resources: 0
heartbeat[5280]: 2011/03/03_04:03:44 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:03:44 WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 54200 ms (> 1010 ms)
before being called (GSource: 0xb29698)
heartbeat[5280]: 2011/03/03_04:03:44 info: Gmain_timeout_dispatch:
started at 542452779 should have started at 542447359
heartbeat[5280]: 2011/03/03_04:03:44 WARN: Gmain_timeout_dispatch:
Dispatch function for update msgfree count was delayed 55170 ms (> 30000
ms) before being called (GSource: 0xb29818)
heartbeat[5280]: 2011/03/03_04:03:44 info: Gmain_timeout_dispatch:
started at 542452779 should have started at 542447262
ipfail[5456]: 2011/03/03_04:03:44 debug: Found ping node 192.168.10.100!
heartbeat[5280]: 2011/03/03_04:03:44 WARN: Gmain_timeout_dispatch:
Dispatch function for client audit was delayed 52220 ms (> 5000 ms)
before being called (GSource: 0xb29518)
heartbeat[5280]: 2011/03/03_04:03:44 info: Gmain_timeout_dispatch:
started at 542452779 should have started at 542447557
ipfail[5456]: 2011/03/03_04:03:44 info: Ping node count is balanced.
ipfail[5456]: 2011/03/03_04:03:44 debug: Abort message sent.
ipfail[5456]: 2011/03/03_04:03:45 debug: Got asked for num_ping.
ipfail[5456]: 2011/03/03_04:03:45 debug: Found ping node 192.168.10.100!
ipfail[5456]: 2011/03/03_04:03:46 info: Ping node count is balanced.
ipfail[5456]: 2011/03/03_04:03:46 debug: Abort message sent.
ipfail[5456]: 2011/03/03_04:03:46 debug: Got asked for num_ping.
ipfail[5456]: 2011/03/03_04:03:47 debug: Found ping node 192.168.10.100!
ipfail[5456]: 2011/03/03_04:03:48 info: Ping node count is balanced.
ipfail[5456]: 2011/03/03_04:03:48 debug: Abort message sent.
ipfail[5456]: 2011/03/03_04:03:48 debug: Other side is unstable.
heartbeat[5280]: 2011/03/03_04:03:54 debug: process_resources(4):  other
now stable - T_SHUTDONE
heartbeat[5280]: 2011/03/03_04:03:54 debug: Process [go_standby] started
pid 1142
heartbeat[5280]: 2011/03/03_04:03:54 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress:
0, going_standby: 3, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:03:54 info: Received shutdown notice from
'ftp02.mycharts.md'.
heartbeat[5280]: 2011/03/03_04:03:54 info: Resources being acquired from
ftp02.mycharts.md.
heartbeat[5280]: 2011/03/03_04:03:54 debug: StartNextRemoteRscReq():
child count 1
heartbeat[5280]: 2011/03/03_04:03:54 debug: takeover_from_node: other
now stable
heartbeat[5280]: 2011/03/03_04:03:54 debug: Process
[req_our_resources(ask)] started pid 1143
heartbeat[1142]: 2011/03/03_04:03:54 info: acquire all HA resources
(standby).
heartbeat[1142]: 2011/03/03_04:03:54 info: go_standby: who: 2 resource
set: all
heartbeat[1142]: 2011/03/03_04:03:54 info: go_standby: (query/action):
(allkeys/takegroup)
heartbeat[1143]: 2011/03/03_04:03:54 debug:
req_our_resources(/usr/share/heartbeat/ResourceManager listkeys
ftp01.mycharts.md)
heartbeat[1143]: 2011/03/03_04:03:54 debug: req_our_resources(): running
[/usr/share/heartbeat/req_resource drbddisk::ha_ftp]
ResourceManager[1169]: 2011/03/03_04:03:55 info: Acquiring resource
group: ftp01.mycharts.md drbddisk::ha_ftp
Filesystem::/dev/drbd0::/ha_ftp::ext3::noatime
IPaddr2::192.168.10.203/24/bond0 pure-ftpd nfs
req_resource[1168]: 2011/03/03_04:03:55 debug: in
/usr/share/heartbeat/req_resource drbddisk::ha_ftp
req_resource[1168]: 2011/03/03_04:03:55 debug: dont_ask:  nice_failback:
yes
heartbeat[1143]: 2011/03/03_04:03:55 info: 1 local resources from
[/usr/share/heartbeat/ResourceManager listkeys ftp01.mycharts.md]
heartbeat[1143]: 2011/03/03_04:03:55 info: Local Resource acquisition
completed.
heartbeat[1143]: 2011/03/03_04:03:55 debug: Sending hold resources msg:
all, stable=1 # req_our_resources()
heartbeat[1143]: 2011/03/03_04:03:55 info: FIFO message [type resource]
written rc=79
heartbeat[5280]: 2011/03/03_04:03:55 debug: StartNextRemoteRscReq():
child count 2
heartbeat[5280]: 2011/03/03_04:03:55 info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (1))
heartbeat[5280]: 2011/03/03_04:03:55 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 2, other_is_stable: 1, takeover_in_progress:
1, going_standby: 3, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:03:55 info: Managed
req_our_resources(ask) process 1143 exited with return code 0.
heartbeat[5280]: 2011/03/03_04:03:55 debug: RscMgmtProc
'req_our_resources(ask)' exited code 0
heartbeat[5280]: 2011/03/03_04:03:55 debug: StartNextRemoteRscReq():
child count 1
ResourceManager[1169]: 2011/03/03_04:03:55 info: Running
/etc/ha.d/resource.d/drbddisk ha_ftp start
ResourceManager[1169]: 2011/03/03_04:03:55 debug: Starting
/etc/ha.d/resource.d/drbddisk ha_ftp start
ResourceManager[1169]: 2011/03/03_04:03:55 debug:
/etc/ha.d/resource.d/drbddisk ha_ftp start done. RC=0
Filesystem[1280]: 2011/03/03_04:03:56 INFO:  Resource is stopped
ResourceManager[1169]: 2011/03/03_04:03:56 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_ftp ext3 noatime start
ResourceManager[1169]: 2011/03/03_04:03:56 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_ftp ext3 noatime start
Filesystem[1366]: 2011/03/03_04:03:56 INFO: Running start for /dev/drbd0
on /ha_ftp
Filesystem[1355]: 2011/03/03_04:03:57 INFO:  Success
INFO:  Success
ResourceManager[1169]: 2011/03/03_04:03:57 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_ftp ext3 noatime start
done. RC=0
IPaddr2[1440]: 2011/03/03_04:03:57 INFO:  Resource is stopped
ResourceManager[1169]: 2011/03/03_04:03:58 info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.10.203/24/bond0 start
ResourceManager[1169]: 2011/03/03_04:03:58 debug: Starting
/etc/ha.d/resource.d/IPaddr2 192.168.10.203/24/bond0 start
IPaddr2[1552]: 2011/03/03_04:03:58 INFO: ip -f inet addr add
192.168.10.203/24 brd 192.168.10.255 dev bond0
IPaddr2[1552]: 2011/03/03_04:03:58 INFO: ip link set bond0 up
IPaddr2[1552]: 2011/03/03_04:03:58 INFO: /usr/lib64/heartbeat/send_arp
-i 200 -r 5 -p
/var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.10.203 bond0
192.168.10.203 auto not_used not_used
IPaddr2[1523]: 2011/03/03_04:03:58 INFO:  Success
INFO:  Success
ResourceManager[1169]: 2011/03/03_04:03:58 debug:
/etc/ha.d/resource.d/IPaddr2 192.168.10.203/24/bond0 start done. RC=0
ResourceManager[1169]: 2011/03/03_04:03:58 info: Running
/etc/init.d/pure-ftpd  start
ResourceManager[1169]: 2011/03/03_04:03:58 debug: Starting
/etc/init.d/pure-ftpd  start
Starting pure-ftpd:                                        [  OK  ]
ResourceManager[1169]: 2011/03/03_04:03:58 debug: /etc/init.d/pure-ftpd
start done. RC=0
ResourceManager[1169]: 2011/03/03_04:03:58 info: Running /etc/init.d/nfs
start
ResourceManager[1169]: 2011/03/03_04:03:58 debug: Starting
/etc/init.d/nfs  start
Starting NFS services:  [  OK  ]
Starting NFS quotas: [  OK  ]
Starting NFS daemon: [  OK  ]
Starting NFS mountd: [  OK  ]
ResourceManager[1169]: 2011/03/03_04:03:59 debug: /etc/init.d/nfs  start
done. RC=0
heartbeat[1142]: 2011/03/03_04:03:59 info: all HA resource acquisition
completed (standby).
heartbeat[1142]: 2011/03/03_04:03:59 debug: Sending standby [done] msg
heartbeat[1142]: 2011/03/03_04:03:59 info: FIFO message [type
ask_resources] written rc=47
heartbeat[5280]: 2011/03/03_04:03:59 debug: Received standby message
done from ftp01.mycharts.md in state 3 
heartbeat[5280]: 2011/03/03_04:03:59 info: Standby resource acquisition
done [all].
heartbeat[5280]: 2011/03/03_04:03:59 debug: Sending hold resources msg:
all, stable=1 # <none>
heartbeat[5280]: 2011/03/03_04:03:59 info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (1))
heartbeat[5280]: 2011/03/03_04:03:59 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress:
1, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:03:59 info: New standby state: 0
heartbeat[5280]: 2011/03/03_04:03:59 info: Managed go_standby process
1142 exited with return code 0.
heartbeat[5280]: 2011/03/03_04:03:59 debug: RscMgmtProc 'go_standby'
exited code 0
heartbeat[5280]: 2011/03/03_04:03:59 debug: StartNextRemoteRscReq() -
calling hook
heartbeat[5280]: 2011/03/03_04:03:59 debug: notify_world: invoking harc:
OLD status: active
heartbeat[5280]: 2011/03/03_04:03:59 debug: Process [status] started pid
1802
heartbeat[5280]: 2011/03/03_04:03:59 debug: Starting notify process
[status]
heartbeat[1802]: 2011/03/03_04:03:59 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[1802]: 2011/03/03_04:03:59 debug: notify_world: Running harc
status
harc[1802]: 2011/03/03_04:03:59 info: Running /etc/ha.d/rc.d/status
status
mach_down[1818]: 2011/03/03_04:03:59 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources
acquired
mach_down[1818]: 2011/03/03_04:03:59 info: mach_down takeover complete
for node ftp02.mycharts.md.
heartbeat[5280]: 2011/03/03_04:03:59 info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (1))
heartbeat[5280]: 2011/03/03_04:03:59 info: mach_down takeover complete.
heartbeat[5280]: 2011/03/03_04:03:59 debug: process_resources(3):  other
now stable
heartbeat[5280]: 2011/03/03_04:03:59 info: AnnounceTakeover(local 1,
foreign 1, reason 'mach_down' (1))
heartbeat[5280]: 2011/03/03_04:03:59 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:03:59 info: Managed status process 1802
exited with return code 0.
heartbeat[5280]: 2011/03/03_04:03:59 debug: RscMgmtProc 'status' exited
code 0
heartbeat[5280]: 2011/03/03_04:03:59 debug: StartNextRemoteRscReq() -
calling hook
heartbeat[5280]: 2011/03/03_04:03:59 debug: notify_world: invoking harc:
OLD status: active
heartbeat[5280]: 2011/03/03_04:03:59 debug: Process [ip-request-resp]
started pid 1852
heartbeat[5280]: 2011/03/03_04:03:59 debug: Starting notify process
[ip-request-resp]
heartbeat[1852]: 2011/03/03_04:03:59 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[1852]: 2011/03/03_04:03:59 debug: notify_world: Running harc
ip-request-resp
harc[1852]: 2011/03/03_04:03:59 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[1852]: 2011/03/03_04:03:59 received ip-request-resp
drbddisk::ha_ftp OK yes
ResourceManager[1873]: 2011/03/03_04:03:59 info: Acquiring resource
group: ftp01.mycharts.md drbddisk::ha_ftp
Filesystem::/dev/drbd0::/ha_ftp::ext3::noatime
IPaddr2::192.168.10.203/24/bond0 pure-ftpd nfs
Filesystem[1915]: 2011/03/03_04:04:00 INFO:  Running OK
IPaddr2[1975]: 2011/03/03_04:04:00 INFO:  Running OK
heartbeat[5280]: 2011/03/03_04:04:00 info: Managed ip-request-resp
process 1852 exited with return code 0.
heartbeat[5280]: 2011/03/03_04:04:00 debug: RscMgmtProc
'ip-request-resp' exited code 0
heartbeat[5280]: 2011/03/03_04:04:00 info: AnnounceTakeover(local 1,
foreign 1, reason 'ip-request-resp' (1))
heartbeat[5340]: 2011/03/03_04:04:25 WARN: glib: TTY write timeout on
[/dev/ttyS0] (no connection or bad cable? [see documentation])
heartbeat[5340]: 2011/03/03_04:04:25 info: glib: See
http://linux-ha.org/FAQ#TTYtimeout <http://linux-ha.org/FAQ#TTYtimeout>
for details
heartbeat[5280]: 2011/03/03_04:04:26 info: Link ftp02.mycharts.md:bond0
dead.
heartbeat[5280]: 2011/03/03_04:04:26 info: Link ftp02.mycharts.md:eth2
dead.
ipfail[5456]: 2011/03/03_04:04:26 info: Link Status update: Link
ftp02.mycharts.md/bond0 now has status dead
ipfail[5456]: 2011/03/03_04:04:27 debug: Found ping node 192.168.10.100!
heartbeat[5280]: 2011/03/03_04:04:27 info: Link
ftp02.mycharts.md:/dev/ttyS0 dead.
ipfail[5456]: 2011/03/03_04:04:27 info: Asking other side for ping node
count.
ipfail[5456]: 2011/03/03_04:04:27 debug: Message [num_ping] sent.
ipfail[5456]: 2011/03/03_04:04:27 info: Checking remote count of ping
nodes.
ipfail[5456]: 2011/03/03_04:04:27 info: Link Status update: Link
ftp02.mycharts.md/eth2 now has status dead
ipfail[5456]: 2011/03/03_04:04:28 debug: Found ping node 192.168.10.100!
ipfail[5456]: 2011/03/03_04:04:28 info: Asking other side for ping node
count.
ipfail[5456]: 2011/03/03_04:04:28 debug: Message [num_ping] sent.
ipfail[5456]: 2011/03/03_04:04:28 info: Checking remote count of ping
nodes.
ipfail[5456]: 2011/03/03_04:04:28 info: Link Status update: Link
ftp02.mycharts.md//dev/ttyS0 now has status dead
ipfail[5456]: 2011/03/03_04:04:29 debug: Found ping node 192.168.10.100!
ipfail[5456]: 2011/03/03_04:04:29 info: Asking other side for ping node
count.
ipfail[5456]: 2011/03/03_04:04:29 debug: Message [num_ping] sent.
ipfail[5456]: 2011/03/03_04:04:29 info: Checking remote count of ping
nodes.
heartbeat[5280]: 2011/03/03_04:04:30 info: Heartbeat restart on node
ftp02.mycharts.md
heartbeat[5280]: 2011/03/03_04:04:30 info: Link
ftp02.mycharts.md:/dev/ttyS0 up.
heartbeat[5280]: 2011/03/03_04:04:30 info: Status update for node
ftp02.mycharts.md: status init
heartbeat[5280]: 2011/03/03_04:04:30 debug: Status seqno: 1 msgtime:
1299153868
ipfail[5456]: 2011/03/03_04:04:30 info: Link Status update: Link
ftp02.mycharts.md//dev/ttyS0 now has status up
heartbeat[5280]: 2011/03/03_04:04:30 debug: StartNextRemoteRscReq() -
calling hook
heartbeat[5280]: 2011/03/03_04:04:30 debug: notify_world: invoking harc:
OLD status: active
heartbeat[5280]: 2011/03/03_04:04:30 debug: Process [status] started pid
2976
heartbeat[5280]: 2011/03/03_04:04:30 debug: Starting notify process
[status]
ipfail[5456]: 2011/03/03_04:04:30 info: Status update: Node
ftp02.mycharts.md now has status init
heartbeat[5280]: 2011/03/03_04:04:30 info: Link ftp02.mycharts.md:bond0
up.
ipfail[5456]: 2011/03/03_04:04:30 info: Link Status update: Link
ftp02.mycharts.md/bond0 now has status up
heartbeat[5280]: 2011/03/03_04:04:30 info: Link ftp02.mycharts.md:eth2
up.
ipfail[5456]: 2011/03/03_04:04:30 info: Link Status update: Link
ftp02.mycharts.md/eth2 now has status up
heartbeat[5280]: 2011/03/03_04:04:30 info: Status update for node
ftp02.mycharts.md: status up
heartbeat[5280]: 2011/03/03_04:04:30 debug: Status seqno: 2 msgtime:
1299153868
heartbeat[5280]: 2011/03/03_04:04:30 debug: StartNextRemoteRscReq():
child count 1
ipfail[5456]: 2011/03/03_04:04:30 info: Status update: Node
ftp02.mycharts.md now has status up
heartbeat[2976]: 2011/03/03_04:04:30 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[2976]: 2011/03/03_04:04:30 debug: notify_world: Running harc
status
harc[2976]: 2011/03/03_04:04:30 info: Running /etc/ha.d/rc.d/status
status
heartbeat[5280]: 2011/03/03_04:04:30 info: Managed status process 2976
exited with return code 0.
heartbeat[5280]: 2011/03/03_04:04:30 debug: RscMgmtProc 'status' exited
code 0
heartbeat[5280]: 2011/03/03_04:04:30 debug: StartNextRemoteRscReq() -
calling hook
heartbeat[5280]: 2011/03/03_04:04:30 debug: notify_world: invoking harc:
OLD status: active
heartbeat[5280]: 2011/03/03_04:04:30 debug: Process [status] started pid
2992
heartbeat[5280]: 2011/03/03_04:04:30 debug: Starting notify process
[status]
heartbeat[2992]: 2011/03/03_04:04:30 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[2992]: 2011/03/03_04:04:30 debug: notify_world: Running harc
status
harc[2992]: 2011/03/03_04:04:30 info: Running /etc/ha.d/rc.d/status
status
heartbeat[5280]: 2011/03/03_04:04:30 info: Managed status process 2992
exited with return code 0.
heartbeat[5280]: 2011/03/03_04:04:30 debug: RscMgmtProc 'status' exited
code 0
heartbeat[5280]: 2011/03/03_04:04:30 debug: Get a reqnodes message from
ftp02.mycharts.md
heartbeat[5280]: 2011/03/03_04:04:30 debug: get_delnodelist:
delnodelist= 
heartbeat[5280]: 2011/03/03_04:04:31 info: Status update for node
ftp02.mycharts.md: status active
heartbeat[5280]: 2011/03/03_04:04:31 debug: Status seqno: 6 msgtime:
1299153870
heartbeat[5280]: 2011/03/03_04:04:31 debug: StartNextRemoteRscReq() -
calling hook
heartbeat[5280]: 2011/03/03_04:04:31 debug: notify_world: invoking harc:
OLD status: active
heartbeat[5280]: 2011/03/03_04:04:31 debug: Process [status] started pid
3046
heartbeat[5280]: 2011/03/03_04:04:31 debug: Starting notify process
[status]
ipfail[5456]: 2011/03/03_04:04:31 info: Status update: Node
ftp02.mycharts.md now has status active
heartbeat[5280]: 2011/03/03_04:04:31 debug: process_resources: other now
unstable
heartbeat[5280]: 2011/03/03_04:04:31 debug: Sending hold resources msg:
all, stable=1 # <none>
heartbeat[5280]: 2011/03/03_04:04:31 info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (1))
heartbeat[5280]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[3046]: 2011/03/03_04:04:31 debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
heartbeat[3046]: 2011/03/03_04:04:31 debug: notify_world: Running harc
status
ipfail[5456]: 2011/03/03_04:04:31 debug: Got join message from another
ipfail client. (ftp02.mycharts.md)
harc[3046]: 2011/03/03_04:04:31 info: Running /etc/ha.d/rc.d/status
status
heartbeat[5280]: 2011/03/03_04:04:31 info: Managed status process 3046
exited with return code 0.
heartbeat[5280]: 2011/03/03_04:04:31 debug: RscMgmtProc 'status' exited
code 0
ipfail[5456]: 2011/03/03_04:04:31 debug: Found ping node 192.168.10.100!
heartbeat[5280]: 2011/03/03_04:04:31 debug: process_resources(2):  other
now unstable
heartbeat[5280]: 2011/03/03_04:04:31 info: other_holds_resources: 0
heartbeat[5280]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:04:31 info: remote resource transition
completed.
heartbeat[5280]: 2011/03/03_04:04:31 debug: Sending hold resources msg:
all, stable=1 # <none>
heartbeat[5280]: 2011/03/03_04:04:31 info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (1))
heartbeat[5280]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[5280]: 2011/03/03_04:04:31 debug: Calling
PerformAutoFailback()
heartbeat[5280]: 2011/03/03_04:04:31 info: other_holds_resources: 0
heartbeat[5280]: 2011/03/03_04:04:31 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress:
0, going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[5456]: 2011/03/03_04:04:31 info: Asking other side for ping node
count.
ipfail[5456]: 2011/03/03_04:04:31 debug: Message [num_ping] sent.
ipfail[5456]: 2011/03/03_04:04:31 debug: Other side is unstable.
ipfail[5456]: 2011/03/03_04:04:31 debug: Other side is now stable.
ipfail[5456]: 2011/03/03_04:04:37 info: No giveup timer to abort.
heartbeat[5280]: 2011/03/03_04:12:49 info: Daily informational memory
statistics

 
 

--
Eric Robinson


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


Disclaimer - March 3, 2011 
This email and any files transmitted with it are confidential and intended 
solely for General Linux-HA mailing list. If you are not the named addressee 
you should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physicians' Managed Care or Physician Select Management. 
Warning: Although Physicians' Managed Care or Physician Select Management has 
taken reasonable precautions to ensure no viruses are present in this email, 
the company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachments. 
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to