Hi I started to use heartbeat this morning. I am new to linux-ha project. So don't blame me if my questions are so simple.
Here is my first question: Are there heartbeat and cluster-glue packages enough to test a simple HA scenario? If the answer is yes, I should tell that I compiled cluster glue and heartbeat successfully and I tried to test a simple scenario (just setting IP or httpd). But it did not work as I expected. It would be great for me if somebody gave me a hint. sg168: authkeys: auth 1 1 sha1 myheartbeat ha.cf: logfacility local0 auto_failback on logfile /var/log/ha-log debugfile /var/log/ha-debug debug 1 keepalive 2 deadtime 15 warntime 10 initdead 120 udpport 694 #bcast eth3 ucast eth3 192.168.50.17 node sg168 # in both nodes command #uname -n should node sg169 # give the these hostnames pacemaker off haresources: sg168 IPaddr::192.168.20.222/24/eth0 debug: Dec 30 05:58:37 sg168 heartbeat: [7095]: info: Configuration validated. Starting heartbeat 3.0.5 Dec 30 05:58:37 sg168 heartbeat: [7095]: debug: HA configuration OK. Heartbeat starting. Dec 30 05:58:37 sg168 heartbeat: [7095]: info: Heartbeat Hg Version: node: 7e3a82377fa8c88b4d9ee47e29020d4531f4629a Dec 30 05:58:37 sg168 heartbeat: [7096]: info: heartbeat: version 3.0.5 Dec 30 05:58:38 sg168 heartbeat: [7096]: info: Heartbeat generation: 1356801148 Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: uuid is:c885d691-407e-4fb1-8214-d28deb39470d Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: FIFO process pid: 7099 Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: opening ucast eth3 (UDP/IP unicast) Dec 30 05:58:38 sg168 heartbeat: [7096]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth3 Dec 30 05:58:38 sg168 heartbeat: [7096]: info: glib: ucast: bound send socket to device: eth3 Dec 30 05:58:38 sg168 heartbeat: [7096]: info: glib: ucast: bound receive socket to device: eth3 Dec 30 05:58:38 sg168 heartbeat: [7096]: info: glib: ucast: started on port 694 interface eth3 to 192.168.50.17 Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: write process pid: 7100 Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: read child process pid: 7101 Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: make_io_childpair: CREATED childpair wchan socket 11 Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: make_io_childpair: CREATED childpair rchan socket 13 Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: Limiting CPU: 42 CPU seconds every 60000 milliseconds Dec 30 05:58:38 sg168 heartbeat: [7099]: debug: pid 7099 locked in memory. Dec 30 05:58:38 sg168 heartbeat: [7099]: debug: Limiting CPU: 6 CPU seconds every 59999 milliseconds Dec 30 05:58:38 sg168 heartbeat: [7101]: debug: pid 7101 locked in memory. Dec 30 05:58:38 sg168 heartbeat: [7101]: debug: Limiting CPU: 6 CPU seconds every 59999 milliseconds Dec 30 05:58:38 sg168 heartbeat: [7100]: debug: pid 7100 locked in memory. Dec 30 05:58:38 sg168 heartbeat: [7100]: debug: Limiting CPU: 24 CPU seconds every 59999 milliseconds Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: pid 7096 locked in memory. Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: Waiting for child processes to start Dec 30 05:58:38 sg168 heartbeat: [7096]: info: Local status now set to: 'up' Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: All your child process are belong to us Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: Starting local status message @ 2000 ms intervals Dec 30 05:58:38 sg168 heartbeat: [7096]: debug: Forking temp process write_hostcachedata Dec 30 05:58:38 sg168 heartbeat: [7096]: info: Managed write_hostcachedata process 7102 exited with return code 0. Dec 30 05:58:48 sg168 heartbeat: [7096]: info: Link sg169:eth3 up. Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: CreateInitialFilter: ip-request Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: CreateInitialFilter: ask_resources Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: CreateInitialFilter: status Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: CreateInitialFilter: ip-request-resp Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: CreateInitialFilter: hb_takeover Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: sending reqnodes msg to node sg169 Dec 30 05:58:48 sg168 heartbeat: [7096]: info: Status update for node sg169: status up Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: Status seqno: 2 msgtime: 1356834534 Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: StartNextRemoteRscReq() - calling hook Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: notify_world: invoking harc: OLD status: up Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: Process [status] started pid 7103 Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: Starting notify process [status] Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: Forking temp process write_hostcachedata Dec 30 05:58:48 sg168 heartbeat: [7103]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 30 05:58:48 sg168 heartbeat: [7103]: debug: notify_world: Running harc status Dec 30 05:58:48 sg168 heartbeat: [7096]: info: Managed write_hostcachedata process 7104 exited with return code 0. Dec 30 05:58:48 sg168 heartbeat: [7096]: WARN: Managed status process 7103 exited with return code 1. Dec 30 05:58:48 sg168 heartbeat: [7096]: debug: RscMgmtProc 'status' exited code 1 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Get a repnodes msg from sg169 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: nodelist received:sg168 sg169 Dec 30 05:58:49 sg168 heartbeat: [7096]: info: Comm_now_up(): updating status to active Dec 30 05:58:49 sg168 heartbeat: [7096]: info: Local status now set to: 'active' Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Sending local starting msg: resourcestate = 0 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 0 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Get a reqnodes message from sg169 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: get_delnodelist: delnodelist= Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Forking temp process write_hostcachedata Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Forking temp process write_delcachedata Dec 30 05:58:49 sg168 heartbeat: [7096]: info: Managed write_hostcachedata process 7110 exited with return code 0. Dec 30 05:58:49 sg168 heartbeat: [7096]: info: Status update for node sg169: status active Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Status seqno: 6 msgtime: 1356834536 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: StartNextRemoteRscReq() - calling hook Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: notify_world: invoking harc: OLD status: active Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Process [status] started pid 7112 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Starting notify process [status] Dec 30 05:58:49 sg168 heartbeat: [7096]: info: AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0)) Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: process_resources: other now unstable Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: Sending hold resources msg: none, stable=0 # <none> Dec 30 05:58:49 sg168 heartbeat: [7096]: info: STATE 1 => 3 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 3 Dec 30 05:58:49 sg168 heartbeat: [7096]: info: STATE 3 => 2 Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 2 Dec 30 05:58:49 sg168 heartbeat: [7096]: info: Managed write_delcachedata process 7111 exited with return code 0. Dec 30 05:58:49 sg168 heartbeat: [7112]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Dec 30 05:58:49 sg168 heartbeat: [7112]: debug: notify_world: Running harc status Dec 30 05:58:49 sg168 heartbeat: [7096]: WARN: Managed status process 7112 exited with return code 1. Dec 30 05:58:49 sg168 heartbeat: [7096]: debug: RscMgmtProc 'status' exited code 1 Dec 30 05:58:59 sg168 heartbeat: [7096]: info: remote resource transition completed. Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: Sending hold resources msg: none, stable=0 # <none> Dec 30 05:58:59 sg168 heartbeat: [7096]: info: STATE 2 => 3 Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 3 Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: Calling PerformAutoFailback() Dec 30 05:58:59 sg168 heartbeat: [7096]: info: other_holds_resources: 1 Dec 30 05:58:59 sg168 heartbeat: [7096]: info: remote resource transition completed. Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: Process [req_our_resources(ask)] started pid 7118 Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: Sending hold resources msg: local, stable=1 # <none> Dec 30 05:58:59 sg168 heartbeat: [7096]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0)) Dec 30 05:58:59 sg168 heartbeat: [7096]: info: Initial resource acquisition complete (T_RESOURCES(us)) Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 3 Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: Calling PerformAutoFailback() Dec 30 05:58:59 sg168 heartbeat: [7096]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1)) Dec 30 05:58:59 sg168 heartbeat: [7096]: info: STATE 3 => 4 Dec 30 05:58:59 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Dec 30 05:58:59 sg168 heartbeat: [7118]: debug: req_our_resources(/usr/local/share/heartbeat/ResourceManager listkeys sg168) Dec 30 05:59:00 sg168 heartbeat: [7118]: ERROR: pclose(/usr/local/share/heartbeat/ResourceManager listkeys sg168) exited with return code 1 Dec 30 05:59:00 sg168 heartbeat: [7118]: ERROR: [/usr/local/share/heartbeat/ResourceManager listkeys sg168] exited with return code 1 Dec 30 05:59:00 sg168 heartbeat: [7118]: info: No local resources [/usr/local/share/heartbeat/ResourceManager listkeys sg168] to acquire. Dec 30 05:59:00 sg168 heartbeat: [7118]: debug: Sending hold resources msg: local, stable=1 # req_our_resources() Dec 30 05:59:00 sg168 heartbeat: [7118]: info: FIFO message [type resource] written rc=81 Dec 30 05:59:00 sg168 heartbeat: [7096]: info: AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1)) Dec 30 05:59:00 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Dec 30 05:59:00 sg168 heartbeat: [7096]: info: other_holds_resources: 1 Dec 30 05:59:00 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Dec 30 05:59:00 sg168 heartbeat: [7096]: info: other_holds_resources: 1 Dec 30 05:59:00 sg168 heartbeat: [7096]: debug: hb_rsc_isstable: ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0, going_standby: 0, standby running(ms): 0, resourcestate: 4 Dec 30 05:59:00 sg168 heartbeat: [7096]: info: Managed req_our_resources(ask) process 7118 exited with return code 0. Dec 30 05:59:00 sg168 heartbeat: [7096]: debug: RscMgmtProc 'req_our_resources(ask)' exited code 0 Dec 30 06:00:15 sg168 heartbeat: [7096]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 120 ms (> 50 ms) (GSource: 0x8a4c390) the configuration on other system is like above. I read the documentation but it didn't help. Thank you so much Ali _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
