On 16/05/2013, at 10:21 AM, Andrew Widdersheim <awiddersh...@hotmail.com> wrote:
> These are the libqb versions: > > libqb-devel-0.14.2-3.el6.x86_64 > libqb-0.14.2-3.el6.x86_64 > > Here is a process listing where lrmd is running: > [root@node1 ~]# ps auxwww | egrep "heartbeat|pacemaker" > root 9553 0.1 0.7 52420 7424 ? SLs May14 1:39 heartbeat: > master control process > root 9556 0.0 0.7 52260 7264 ? SL May14 0:10 heartbeat: > FIFO reader > root 9557 0.0 0.7 52256 7260 ? SL May14 1:01 heartbeat: > write: mcast eth0 > root 9558 0.0 0.7 52256 7260 ? SL May14 0:14 heartbeat: > read: mcast eth0 > root 9559 0.0 0.7 52256 7260 ? SL May14 0:23 heartbeat: > write: bcast eth1 > root 9560 0.0 0.7 52256 7260 ? SL May14 0:13 heartbeat: > read: bcast eth1 > 498 9563 0.0 0.2 36908 2392 ? S May14 0:10 > /usr/lib64/heartbeat/ccm > 498 9564 0.0 1.0 85084 10704 ? S May14 0:25 > /usr/lib64/heartbeat/cib > root 9565 0.0 0.1 44588 1896 ? S May14 0:04 > /usr/lib64/heartbeat/lrmd -r Heartbeat is starting the wrong lrmd by the looks of it. Is /usr/lib64/heartbeat/lrmd the same as /usr/libexec/pacemaker/lrmd ? > root 9566 0.0 0.3 83544 3988 ? S May14 0:10 > /usr/lib64/heartbeat/stonithd > 498 9567 0.0 0.3 78668 3248 ? S May14 0:10 > /usr/lib64/heartbeat/attrd > 498 26534 0.0 0.3 92364 3748 ? S 16:05 0:00 > /usr/lib64/heartbeat/crmd > 498 26535 0.0 0.2 72840 2708 ? S 16:05 0:00 > /usr/libexec/pacemaker/pengine > > > Here are the logs at startup until the "Failed to sign on" message just > starts to repeat over and over: > May 15 16:07:06 node1 crmd[26621]: notice: main: CRM Git Version: b060cae > May 15 16:07:06 node1 attrd[26620]: notice: crm_cluster_connect: Connecting > to cluster infrastructure: heartbeat > May 15 16:07:06 node1 attrd[26620]: notice: main: Starting mainloop... > May 15 16:07:06 node1 stonith-ng[26619]: notice: crm_cluster_connect: > Connecting to cluster infrastructure: heartbeat > May 15 16:07:06 node1 cib[26617]: notice: crm_cluster_connect: Connecting > to cluster infrastructure: heartbeat > May 15 16:07:06 node1 lrmd: [26618]: WARN: Initializing connection to logging > daemon failed. Logging daemon may not be running > May 15 16:07:06 node1 lrmd: [26618]: info: max-children set to 4 (1 > processors online) > May 15 16:07:06 node1 lrmd: [26618]: info: enabling coredumps > May 15 16:07:06 node1 lrmd: [26618]: info: Started. > May 15 16:07:06 node1 cib[26617]: warning: ccm_connect: CCM Activation failed > May 15 16:07:06 node1 cib[26617]: warning: ccm_connect: CCM Connection > failed 1 times (30 max) > May 15 16:07:06 node1 ccm: [26616]: WARN: Initializing connection to logging > daemon failed. Logging daemon may not be running > May 15 16:07:06 node1 ccm: [26616]: info: Hostname: node1 > May 15 16:07:07 node1 crmd[26621]: warning: do_cib_control: Couldn't > complete CIB registration 1 times... pause and retry > May 15 16:07:09 node1 cib[26617]: warning: ccm_connect: CCM Activation failed > May 15 16:07:09 node1 cib[26617]: warning: ccm_connect: CCM Connection > failed 2 times (30 max) > May 15 16:07:10 node1 crmd[26621]: warning: do_cib_control: Couldn't > complete CIB registration 2 times... pause and retry > May 15 16:07:13 node1 crmd[26621]: notice: crm_cluster_connect: Connecting > to cluster infrastructure: heartbeat > May 15 16:07:14 node1 cib[26617]: notice: crm_update_peer_state: > crm_update_ccm_node: Node node2[1] - state is now member (was (null)) > May 15 16:07:14 node1 cib[26617]: notice: crm_update_peer_state: > crm_update_ccm_node: Node node1[0] - state is now member (was (null)) > May 15 16:07:15 node1 crmd[26621]: warning: do_lrm_control: Failed to sign > on to the LRM 1 (30 max) times > > Here is the repeating message peices: > May 15 16:06:09 node1 crmd[26534]: error: do_lrm_control: Failed to sign > on to the LRM 30 (max) times > May 15 16:06:09 node1 crmd[26534]: error: do_log: FSA: Input I_ERROR from > do_lrm_control() received in state S_STARTING > May 15 16:06:09 node1 crmd[26534]: warning: do_state_transition: State > transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA > _INTERNAL origin=do_lrm_control ] > May 15 16:06:09 node1 crmd[26534]: warning: do_recover: Fast-tracking > shutdown in response to errors > May 15 16:06:09 node1 crmd[26534]: error: do_started: Start cancelled... > S_RECOVERY > May 15 16:06:09 node1 crmd[26534]: error: do_log: FSA: Input I_TERMINATE > from do_recover() received in state S_RECOVERY > May 15 16:06:09 node1 crmd[26534]: notice: do_lrm_control: Disconnected > from the LRM > May 15 16:06:09 node1 ccm: [9563]: info: client (pid=26534) removed from ccm > May 15 16:06:09 node1 crmd[26534]: error: do_exit: Could not recover from > internal error > May 15 16:06:09 node1 crmd[26534]: error: crm_abort: crm_glib_handler: > Forked child 26540 to record non-fatal assert at logging.c:63 > : g_hash_table_size: assertion `hash_table != NULL' failed > May 15 16:06:09 node1 crmd[26534]: error: crm_abort: crm_glib_handler: > Forked child 26541 to record non-fatal assert at logging.c:63 > : g_hash_table_destroy: assertion `hash_table != NULL' failed > May 15 16:06:09 node1 crmd[26542]: notice: main: CRM Git Version: b060cae > May 15 16:06:09 node1 crmd[26542]: notice: crm_cluster_connect: Connecting > to cluster infrastructure: heartbeat > May 15 16:06:10 node1 crmd[26542]: warning: do_lrm_control: Failed to sign > on to the LRM 1 (30 max) times > May 15 16:06:10 node1 crmd[26542]: notice: crmd_client_status_callback: > Status update: Client node1/crmd now has status [join] (DC=false) > May 15 16:06:10 node1 crmd[26542]: notice: crmd_client_status_callback: > Status update: Client node1/crmd now has status [online] (DC=false) > May 15 16:06:10 node1 crmd[26542]: notice: crmd_client_status_callback: > Status update: Client node2/crmd now has status [online] (DC=false) > May 15 16:06:11 node1 crmd[26542]: warning: do_lrm_control: Failed to sign > on to the LRM 2 (30 max) times > May 15 16:06:12 node1 crmd[26542]: warning: do_lrm_control: Failed to sign > on to the LRM 3 (30 max) times > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org