Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

Andrew Beekhof Wed, 15 May 2013 20:06:44 -0700

On 16/05/2013, at 10:21 AM, Andrew Widdersheim <awiddersh...@hotmail.com> wrote:


> These are the libqb versions:
> 
> libqb-devel-0.14.2-3.el6.x86_64
> libqb-0.14.2-3.el6.x86_64
> 
> Here is a process listing where lrmd is running:
> [root@node1 ~]# ps auxwww | egrep "heartbeat|pacemaker"
> root      9553  0.1  0.7  52420  7424 ?        SLs  May14   1:39 heartbeat: 
> master control process
> root      9556  0.0  0.7  52260  7264 ?        SL   May14   0:10 heartbeat: 
> FIFO reader
> root      9557  0.0  0.7  52256  7260 ?        SL   May14   1:01 heartbeat: 
> write: mcast eth0
> root      9558  0.0  0.7  52256  7260 ?        SL   May14   0:14 heartbeat: 
> read: mcast eth0
> root      9559  0.0  0.7  52256  7260 ?        SL   May14   0:23 heartbeat: 
> write: bcast eth1
> root      9560  0.0  0.7  52256  7260 ?        SL   May14   0:13 heartbeat: 
> read: bcast eth1
> 498       9563  0.0  0.2  36908  2392 ?        S    May14   0:10 
> /usr/lib64/heartbeat/ccm
> 498       9564  0.0  1.0  85084 10704 ?        S    May14   0:25 
> /usr/lib64/heartbeat/cib
> root      9565  0.0  0.1  44588  1896 ?        S    May14   0:04 
> /usr/lib64/heartbeat/lrmd -r

Heartbeat is starting the wrong lrmd by the looks of it.
Is /usr/lib64/heartbeat/lrmd the same as /usr/libexec/pacemaker/lrmd ?

> root      9566  0.0  0.3  83544  3988 ?        S    May14   0:10 
> /usr/lib64/heartbeat/stonithd
> 498       9567  0.0  0.3  78668  3248 ?        S    May14   0:10 
> /usr/lib64/heartbeat/attrd
> 498      26534  0.0  0.3  92364  3748 ?        S    16:05   0:00 
> /usr/lib64/heartbeat/crmd
> 498      26535  0.0  0.2  72840  2708 ?        S    16:05   0:00 
> /usr/libexec/pacemaker/pengine
> 
> 
> Here are the logs at startup until the "Failed to sign on" message just 
> starts to repeat over and over:
> May 15 16:07:06 node1 crmd[26621]:   notice: main: CRM Git Version: b060cae
> May 15 16:07:06 node1 attrd[26620]:   notice: crm_cluster_connect: Connecting 
> to cluster infrastructure: heartbeat
> May 15 16:07:06 node1 attrd[26620]:   notice: main: Starting mainloop...
> May 15 16:07:06 node1 stonith-ng[26619]:   notice: crm_cluster_connect: 
> Connecting to cluster infrastructure: heartbeat
> May 15 16:07:06 node1 cib[26617]:   notice: crm_cluster_connect: Connecting 
> to cluster infrastructure: heartbeat
> May 15 16:07:06 node1 lrmd: [26618]: WARN: Initializing connection to logging 
> daemon failed. Logging daemon may not be running
> May 15 16:07:06 node1 lrmd: [26618]: info: max-children set to 4 (1 
> processors online)
> May 15 16:07:06 node1 lrmd: [26618]: info: enabling coredumps
> May 15 16:07:06 node1 lrmd: [26618]: info: Started.
> May 15 16:07:06 node1 cib[26617]:  warning: ccm_connect: CCM Activation failed
> May 15 16:07:06 node1 cib[26617]:  warning: ccm_connect: CCM Connection 
> failed 1 times (30 max)
> May 15 16:07:06 node1 ccm: [26616]: WARN: Initializing connection to logging 
> daemon failed. Logging daemon may not be running
> May 15 16:07:06 node1 ccm: [26616]: info: Hostname: node1
> May 15 16:07:07 node1 crmd[26621]:  warning: do_cib_control: Couldn't 
> complete CIB registration 1 times... pause and retry
> May 15 16:07:09 node1 cib[26617]:  warning: ccm_connect: CCM Activation failed
> May 15 16:07:09 node1 cib[26617]:  warning: ccm_connect: CCM Connection 
> failed 2 times (30 max)
> May 15 16:07:10 node1 crmd[26621]:  warning: do_cib_control: Couldn't 
> complete CIB registration 2 times... pause and retry
> May 15 16:07:13 node1 crmd[26621]:   notice: crm_cluster_connect: Connecting 
> to cluster infrastructure: heartbeat
> May 15 16:07:14 node1 cib[26617]:   notice: crm_update_peer_state: 
> crm_update_ccm_node: Node node2[1] - state is now member (was (null))
> May 15 16:07:14 node1 cib[26617]:   notice: crm_update_peer_state: 
> crm_update_ccm_node: Node node1[0] - state is now member (was (null))
> May 15 16:07:15 node1 crmd[26621]:  warning: do_lrm_control: Failed to sign 
> on to the LRM 1 (30 max) times
> 
> Here is the repeating message peices:
> May 15 16:06:09 node1 crmd[26534]:    error: do_lrm_control: Failed to sign 
> on to the LRM 30 (max) times
> May 15 16:06:09 node1 crmd[26534]:    error: do_log: FSA: Input I_ERROR from 
> do_lrm_control() received in state S_STARTING
> May 15 16:06:09 node1 crmd[26534]:  warning: do_state_transition: State 
> transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA
> _INTERNAL origin=do_lrm_control ]
> May 15 16:06:09 node1 crmd[26534]:  warning: do_recover: Fast-tracking 
> shutdown in response to errors
> May 15 16:06:09 node1 crmd[26534]:    error: do_started: Start cancelled... 
> S_RECOVERY
> May 15 16:06:09 node1 crmd[26534]:    error: do_log: FSA: Input I_TERMINATE 
> from do_recover() received in state S_RECOVERY
> May 15 16:06:09 node1 crmd[26534]:   notice: do_lrm_control: Disconnected 
> from the LRM
> May 15 16:06:09 node1 ccm: [9563]: info: client (pid=26534) removed from ccm
> May 15 16:06:09 node1 crmd[26534]:    error: do_exit: Could not recover from 
> internal error
> May 15 16:06:09 node1 crmd[26534]:    error: crm_abort: crm_glib_handler: 
> Forked child 26540 to record non-fatal assert at logging.c:63
>  : g_hash_table_size: assertion `hash_table != NULL' failed
> May 15 16:06:09 node1 crmd[26534]:    error: crm_abort: crm_glib_handler: 
> Forked child 26541 to record non-fatal assert at logging.c:63
>  : g_hash_table_destroy: assertion `hash_table != NULL' failed
> May 15 16:06:09 node1 crmd[26542]:   notice: main: CRM Git Version: b060cae
> May 15 16:06:09 node1 crmd[26542]:   notice: crm_cluster_connect: Connecting 
> to cluster infrastructure: heartbeat
> May 15 16:06:10 node1 crmd[26542]:  warning: do_lrm_control: Failed to sign 
> on to the LRM 1 (30 max) times
> May 15 16:06:10 node1 crmd[26542]:   notice: crmd_client_status_callback: 
> Status update: Client node1/crmd now has status [join] (DC=false)
> May 15 16:06:10 node1 crmd[26542]:   notice: crmd_client_status_callback: 
> Status update: Client node1/crmd now has status [online] (DC=false)
> May 15 16:06:10 node1 crmd[26542]:   notice: crmd_client_status_callback: 
> Status update: Client node2/crmd now has status [online] (DC=false)
> May 15 16:06:11 node1 crmd[26542]:  warning: do_lrm_control: Failed to sign 
> on to the LRM 2 (30 max) times
> May 15 16:06:12 node1 crmd[26542]:  warning: do_lrm_control: Failed to sign 
> on to the LRM 3 (30 max) times
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

Reply via email to