Hi all, I got a problem with my heartbeat configuration. I have two machines that should work in active/passive failover mode. After a few starting problems with the heartbeat v2 configuration I switched to heartbeat v1 configuration to keep it as simple as possible for the beginning.
When I boot the master node and the slave isnt' started at all the machine boots up just fine the resources get started and everything works as expected. Booting the slave and shutting down the master after a while the slave gets rebooted after taking over the resources and logging some errors relating to ipsec. The same may happen when the master gets rebooted and tries to take over the resources. To be honest I'm a bit puzzled and don't know where the errors come from or how to debug some more so I try to find some help on the mailling list. I read on wiki.linux-ha.org that its normal behaviour to reboot the machine if errors occur while stopping resources. I think thats where the reboots come from but I don't know why the machine has problems with stopping the resource. According to the log file the resource gets correctly stopped but heartbeat tries over and over again to stop it. I configured the init scripts so they comply to the rules in http://wiki.linux-ha.org/LSBResourceAgent I attached some configs and log snippets. I'd be glad if anyone could help to light up the dark a little bit ;) TIA Marc Both machines are Ubuntu Server LTS 0804 patched up to date. Linux heartbeat-1 2.6.24-24-server #1 SMP Tue Jul 7 20:21:17 UTC 2009 i686 GNU/Linux ii heartbeat-2 2.1.3-2 Subsystem for High-Availability Linux ii openswan 1:2.4.9+dfsg-1build1 IPSEC utilities for Openswan /etc/ha.d/haresources heartbeat-1 10.85.118.245 ipsec Jul 21 16:43:50 heartbeat-1 heartbeat: [4566]: info: Link heartbeat-1:eth0 up. Jul 21 16:43:50 heartbeat-1 harc[4656]: info: Running /etc/ha.d/rc.d/status status Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: Comm_now_up(): updating status to active Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: Local status now set to: 'active' Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: remote resource transition completed. Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: remote resource transition completed. Jul 21 16:43:51 heartbeat-1 heartbeat: [4566]: info: Local Resource acquisition completed. (none) Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: heartbeat-2 wants to go standby [foreign] Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: standby: acquire [foreign] resources from heartbeat-2 Jul 21 16:43:52 heartbeat-1 heartbeat: [4673]: info: acquire local HA resources (standby). Jul 21 16:43:52 heartbeat-1 heartbeat: [4673]: info: local HA resource acquisition completed (standby). Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: Standby resource acquisition done [foreign]. Jul 21 16:43:52 heartbeat-1 heartbeat: [4566]: info: Initial resource acquisition complete (auto_failback) Jul 21 16:43:53 heartbeat-1 heartbeat: [4566]: info: remote resource transition completed. Jul 21 16:44:21 heartbeat-1 kernel: [ 164.698146] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4 Jul 21 16:45:26 heartbeat-1 heartbeat: [4566]: info: Received shutdown notice from 'heartbeat-2'. Jul 21 16:45:26 heartbeat-1 heartbeat: [4566]: info: Resources being acquired from heartbeat-2. Jul 21 16:45:26 heartbeat-1 heartbeat: [4728]: info: acquire local HA resources (standby). Jul 21 16:45:26 heartbeat-1 heartbeat: [4729]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys heartbeat-1] to acquire. Jul 21 16:45:26 heartbeat-1 heartbeat: [4728]: info: local HA resource acquisition completed (standby). Jul 21 16:45:26 heartbeat-1 heartbeat: [4566]: info: Standby resource acquisition done [foreign]. Jul 21 16:45:26 heartbeat-1 harc[4754]: info: Running /etc/ha.d/rc.d/status status Jul 21 16:45:26 heartbeat-1 mach_down[4768]: info: Taking over resource group 10.85.118.245 Jul 21 16:45:26 heartbeat-1 ResourceManager[4792]: info: Acquiring resource group: heartbeat-2 10.85.118.245 ipsec Jul 21 16:45:26 heartbeat-1 IPaddr[4818]: INFO: Resource is stopped Jul 21 16:45:26 heartbeat-1 ResourceManager[4792]: info: Running /etc/ha.d/resource.d/IPaddr 10.85.118.245 start Jul 21 16:45:27 heartbeat-1 IPaddr[4889]: INFO: Using calculated nic for 10.85.118.245: eth2 Jul 21 16:45:27 heartbeat-1 IPaddr[4889]: INFO: Using calculated netmask for 10.85.118.245: 255.255.0.0 Jul 21 16:45:27 heartbeat-1 IPaddr[4889]: INFO: eval ifconfig eth2:0 10.85.118.245 netmask 255.255.0.0 broadcast 10.85.255.255 Jul 21 16:45:27 heartbeat-1 IPaddr[4874]: INFO: Success Jul 21 16:45:27 heartbeat-1 kernel: [ 230.782186] NET: Registered protocol family 17 Jul 21 16:45:27 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec start Jul 21 16:45:27 heartbeat-1 kernel: [ 231.002200] NET: Registered protocol family 15 Jul 21 16:45:27 heartbeat-1 kernel: [ 231.582742] Initializing XFRM netlink socket Jul 21 16:45:28 heartbeat-1 ResourceManager[4792]: CRIT: Giving up resources due to failure of ipsec Jul 21 16:45:28 heartbeat-1 ResourceManager[4792]: info: Releasing resource group: heartbeat-2 10.85.118.245 ipsec Jul 21 16:45:28 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec stop Jul 21 16:45:30 heartbeat-1 ResourceManager[4792]: info: Retrying failed stop operation [ipsec] Jul 21 16:45:30 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec stop Jul 21 16:45:30 heartbeat-1 kernel: [ 234.467984] NET: Unregistered protocol family 15 Jul 21 16:45:31 heartbeat-1 ResourceManager[4792]: info: Retrying failed stop operation [ipsec] Jul 21 16:45:31 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec stop Jul 21 16:45:32 heartbeat-1 ResourceManager[4792]: info: Retrying failed stop operation [ipsec] Jul 21 16:45:32 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec stop Jul 21 16:45:33 heartbeat-1 ResourceManager[4792]: info: Retrying failed stop operation [ipsec] Jul 21 16:45:34 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec stop Jul 21 16:45:35 heartbeat-1 ResourceManager[4792]: info: Retrying failed stop operation [ipsec] Jul 21 16:45:35 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec stop Jul 21 16:45:36 heartbeat-1 ResourceManager[4792]: info: Retrying failed stop operation [ipsec] Jul 21 16:45:36 heartbeat-1 ResourceManager[4792]: info: Running /etc/init.d/ipsec stop Jul 21 17:08:58 heartbeat-1 syslogd 1.5.0#1ubuntu1: restart. daemon.log Jul 21 16:43:49 heartbeat-1 logd: [4486]: info: logd started with default configuration. Jul 21 16:43:49 heartbeat-1 logd: [4486]: WARN: Core dumps could be lost if multiple dumps occur. Jul 21 16:43:49 heartbeat-1 logd: [4486]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability Jul 21 16:43:49 heartbeat-1 logd: [4486]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Jul 21 16:43:49 heartbeat-1 logd: [4490]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jul 21 16:43:49 heartbeat-1 logd: [4486]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jul 21 16:43:49 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:43:49 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:43:49 heartbeat-1 ipsec_setup: stop ordered, but IPsec does not appear to be running! Jul 21 16:43:49 heartbeat-1 ipsec_setup: doing cleanup anyway... Jul 21 16:45:27 heartbeat-1 ipsec_setup: NETKEY on eth2 10.85.118.241/255.255.0.0 broadcast 10.85.255.255 Jul 21 16:45:28 heartbeat-1 ipsec_setup: ...Openswan IPsec started Jul 21 16:45:28 heartbeat-1 ipsec_setup: Starting Openswan IPsec 2.4.9... Jul 21 16:45:28 heartbeat-1 rmmod: ERROR: Module af_key is in use Jul 21 16:45:28 heartbeat-1 rmmod: ERROR: Module xfrm_user is in use Jul 21 16:45:28 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:45:28 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:45:29 heartbeat-1 ipsec__plutorun: 104 "sup-test" #1: STATE_MAIN_I1: initiate Jul 21 16:45:29 heartbeat-1 ipsec__plutorun: ...could not start conn "sup-test" Jul 21 16:45:30 heartbeat-1 rmmod: ERROR: Module xfrm_user is in use Jul 21 16:45:30 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:45:30 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:45:30 heartbeat-1 ipsec_setup: stop ordered, but IPsec does not appear to be running! Jul 21 16:45:30 heartbeat-1 ipsec_setup: doing cleanup anyway... Jul 21 16:45:31 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:45:31 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:45:31 heartbeat-1 ipsec_setup: stop ordered, but IPsec does not appear to be running! Jul 21 16:45:31 heartbeat-1 ipsec_setup: doing cleanup anyway... Jul 21 16:45:32 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:45:32 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:45:32 heartbeat-1 ipsec_setup: stop ordered, but IPsec does not appear to be running! Jul 21 16:45:32 heartbeat-1 ipsec_setup: doing cleanup anyway... Jul 21 16:45:34 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:45:34 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:45:34 heartbeat-1 ipsec_setup: stop ordered, but IPsec does not appear to be running! Jul 21 16:45:34 heartbeat-1 ipsec_setup: doing cleanup anyway... Jul 21 16:45:35 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:45:35 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:45:35 heartbeat-1 ipsec_setup: stop ordered, but IPsec does not appear to be running! Jul 21 16:45:35 heartbeat-1 ipsec_setup: doing cleanup anyway... Jul 21 16:45:36 heartbeat-1 ipsec_setup: ...Openswan IPsec stopped Jul 21 16:45:36 heartbeat-1 ipsec_setup: Stopping Openswan IPsec... Jul 21 16:45:36 heartbeat-1 ipsec_setup: stop ordered, but IPsec does not appear to be running! Jul 21 16:45:36 heartbeat-1 ipsec_setup: doing cleanup anyway... _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
