Am 08.04.2013 03:54, schrieb Andrew Beekhof:
Looks like pacemaker is already running. How are you try to start pacemaker?
Started as usual with the init script: service heartbeat start This produces the following error messages: Apr 5 13:35:11 lb1 pengine[29465]: error: main: Failed to create IPC server: shutting down and inhibiting respawn Apr 5 13:35:27 lb1 pengine[29468]: error: qb_ipcs_us_publish: Could not bind AF_UNIX (): Address already in use (98) Apr 5 13:35:27 lb1 pengine[29468]: error: mainloop_add_ipc_server: Could not start pengine IPC server: Address already in use (-98) And the cluster node not starting completely, here is the complete log sequence: Apr 5 13:09:43 lb1 heartbeat: [29423]: info: Starting "/usr/lib64/heartbeat/crmd" as uid 499 gid 498 (pid 29423) Apr 5 13:09:43 lb1 heartbeat: [29422]: info: Starting "/usr/lib64/heartbeat/attrd" as uid 499 gid 498 (pid 29422) Apr 5 13:09:43 lb1 heartbeat: [29420]: info: Starting "/usr/lib64/heartbeat/lrmd -r" as uid 0 gid 0 (pid 29420) Apr 5 13:09:43 lb1 heartbeat: [29421]: info: Starting "/usr/lib64/heartbeat/stonithd" as uid 0 gid 0 (pid 29421) Apr 5 13:09:43 lb1 heartbeat: [29419]: info: Starting "/usr/lib64/heartbeat/cib" as uid 499 gid 498 (pid 29419) Apr 5 13:09:43 lb1 heartbeat: [29418]: info: Starting "/usr/lib64/heartbeat/ccm" as uid 499 gid 498 (pid 29418) Apr 5 13:09:43 lb1 lrmd: [29420]: info: max-children set to 4 (1 processors online) Apr 5 13:09:44 lb1 lrmd: [29420]: info: enabling coredumps Apr 5 13:09:44 lb1 lrmd: [29420]: info: Started. Apr 5 13:09:44 lb1 ccm: [29418]: info: Hostname: lb1 Apr 5 13:09:44 lb1 heartbeat: [29409]: info: the send queue length from heartbeat to client ccm is set to 1024 Apr 5 13:09:44 lb1 attrd[29422]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Apr 5 13:09:44 lb1 heartbeat: [29409]: info: the send queue length from heartbeat to client attrd is set to 1024 Apr 5 13:09:44 lb1 crmd[29423]: notice: main: CRM Git Version: 2a917dd Apr 5 13:09:44 lb1 stonith-ng[29421]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Apr 5 13:09:44 lb1 attrd[29422]: notice: main: Starting mainloop... Apr 5 13:09:44 lb1 heartbeat: [29409]: info: the send queue length from heartbeat to client stonith-ng is set to 1024 Apr 5 13:09:44 lb1 pengine[29426]: error: qb_ipcs_us_publish: Could not bind AF_UNIX (): Address already in use (98) Apr 5 13:09:44 lb1 pengine[29426]: error: mainloop_add_ipc_server: Could not start pengine IPC server: Address already in use (-98) Apr 5 13:09:44 lb1 pengine[29426]: error: main: Failed to create IPC server: shutting down and inhibiting respawn Apr 5 13:09:44 lb1 cib[29419]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Apr 5 13:09:44 lb1 crmd[29423]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Apr 5 13:09:44 lb1 crmd[29423]: error: crmdManagedChildDied: Child process pengine exited (pid=29426, rc=100) Apr 5 13:09:44 lb1 heartbeat: [29409]: WARN: 1 lost packet(s) for [lb2] [74316:74318] Apr 5 13:09:44 lb1 heartbeat: [29409]: info: No pkts missing from lb2! Apr 5 13:09:44 lb1 heartbeat: [29409]: info: the send queue length from heartbeat to client cib is set to 1024 Apr 5 13:09:45 lb1 cib[29419]: notice: cib_server_process_diff: Not applying diff 0.44.18 -> 0.44.19 (sync in progress) Apr 5 13:09:45 lb1 cib[29419]: notice: cib_server_process_diff: Not applying diff 0.44.19 -> 0.44.20 (sync in progress) Apr 5 13:09:45 lb1 cib[29419]: notice: cib_server_process_diff: Not applying diff 0.44.20 -> 0.44.21 (sync in progress) Apr 5 13:09:45 lb1 cib[29419]: notice: cib_server_process_diff: Not applying diff 0.44.21 -> 0.44.22 (sync in progress) Apr 5 13:09:45 lb1 cib[29419]: notice: cib_server_process_diff: Not applying diff 0.44.22 -> 0.44.23 (sync in progress) Apr 5 13:09:45 lb1 stonith-ng[29421]: notice: setup_cib: Watching for stonith topology changes Apr 5 13:09:46 lb1 crmd[29423]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Apr 5 13:09:46 lb1 heartbeat: [29409]: info: the send queue length from heartbeat to client crmd is set to 1024 Apr 5 13:09:47 lb1 cib[29419]: notice: crm_update_peer_state: crm_update_ccm_node: Node lb2[1] - state is now member (was (null)) Apr 5 13:09:47 lb1 cib[29419]: notice: crm_update_peer_state: crm_update_ccm_node: Node lb1[0] - state is now member (was (null)) Apr 5 13:09:48 lb1 crmd[29423]: warning: do_lrm_control: Failed to sign on to the LRM 1 (30 max) times Apr 5 13:09:48 lb1 crmd[29423]: notice: crmd_client_status_callback: Status update: Client lb1/crmd now has status [join] (DC=false) Apr 5 13:09:48 lb1 crmd[29423]: notice: crmd_client_status_callback: Status update: Client lb1/crmd now has status [online] (DC=false) Apr 5 13:09:48 lb1 crmd[29423]: notice: crmd_client_status_callback: Status update: Client lb2/crmd now has status [online] (DC=false) Apr 5 13:09:48 lb1 crmd[29423]: warning: do_lrm_control: Failed to sign on to the LRM 2 (30 max) times Apr 5 13:09:48 lb1 crmd[29423]: warning: do_lrm_control: Failed to sign on to the LRM 3 (30 max) times Apr 5 13:09:50 lb1 crmd[29423]: warning: do_lrm_control: Failed to sign on to the LRM 4 (30 max) times Apr 5 13:09:52 lb1 crmd[29423]: warning: do_lrm_control: Failed to sign on to the LRM 5 (30 max) times Apr 5 13:09:54 lb1 crmd[29423]: warning: do_lrm_control: Failed to sign on to the LRM 6 (30 max) times Apr 5 13:09:56 lb1 crmd[29423]: warning: do_lrm_control: Failed to sign on to the LRM 7 (30 max) times ....
# /usr/libexec/pacemaker/pengine -V info: qb_ipcs_us_publish: server name: pengine error: qb_ipcs_us_publish: Could not bind AF_UNIX (): Address already in use (98) info: qb_ipcs_us_withdraw: withdrawing server sockets info: qb_ipcs_us_withdraw: withdrawing server sockets error: mainloop_add_ipc_server: Could not start pengine IPC server: Address already in use (-98) error: main: Failed to create IPC server: shutting down and inhibiting respawn info: crm_xml_cleanup: Cleaning up memory from libxml2
Just tried this to isolate the problem. Trying this there was definitely no other cluster component running! Thanks Andreas _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org