Re: [Pacemaker] CRMd exits because of internal error

Andrew Beekhof Wed, 24 Jul 2013 19:46:07 -0700

On 19/07/2013, at 12:20 AM, K Mehta <kiranmehta1...@gmail.com> wrote:


> Hi,
> 
> I have a two node cluster. I have few resources configured on it. On vqa12, 
> CRMd dies due to some internal error. It is not clear why CRMd decides to die 
> on May5 at 22:14:50  on system vqa12

Its because of:

May 05 22:14:50 [3518] vqa12       crmd:    error: cib_quorum_update_complete:  
Quorum update 135 failed

Could the machine have been overloaded?  Thats usually the reason.

> ================
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_exit:     Performing 
> A_EXIT_0 - gracefully exiting the CRMd
> ================
> 
> 
> Corosync logs vqa12
> ==============
> 
> May 05 22:12:53 [3517] vqa12    pengine:     info: determine_online_status:   
>   Node vqa12 is online
> May 05 22:12:53 [3517] vqa12    pengine:     info: determine_online_status:   
>   Node vqa11 is online
> May 05 22:12:53 [3517] vqa12    pengine:     info: find_anonymous_clone:      
>   Internally renamed vha-94b33532-15ba-4923-a920-ab9268ccd856 on vqa12 to 
> vha-94b33532-15ba-4923-a920-ab9268ccd856:0
> May 05 22:12:53 [3517] vqa12    pengine:     info: native_print:        
> vgc_virtual_ip  (ocf::heartbeat:IPaddr2):       Started vqa12
> May 05 22:12:53 [3517] vqa12    pengine:     info: clone_print:          
> Master/Slave Set: ms-94b33532-15ba-4923-a920-ab9268ccd856 
> [vha-94b33532-15ba-4923-a920-ab9268ccd856]
> May 05 22:12:53 [3517] vqa12    pengine:     info: short_print:              
> Masters: [ vqa12 ]
> May 05 22:12:53 [3517] vqa12    pengine:     info: short_print:              
> Stopped: [ vha-94b33532-15ba-4923-a920-ab9268ccd856:1 ]
> May 05 22:12:53 [3517] vqa12    pengine:     info: master_color:        
> Promoting vha-94b33532-15ba-4923-a920-ab9268ccd856:0 (Master vqa12)
> May 05 22:12:53 [3517] vqa12    pengine:     info: master_color:        
> ms-94b33532-15ba-4923-a920-ab9268ccd856: Promoted 1 instances of a possible 1 
> to master
> May 05 22:12:53 [3517] vqa12    pengine:     info: RecurringOp:          
> Start recurring monitor (31s) for vha-94b33532-15ba-4923-a920-ab9268ccd856:1 
> on vqa11
> May 05 22:12:53 [3517] vqa12    pengine:     info: RecurringOp:          
> Start recurring monitor (31s) for vha-94b33532-15ba-4923-a920-ab9268ccd856:1 
> on vqa11
> May 05 22:12:53 [3517] vqa12    pengine:     info: LogActions:  Leave   
> vgc_virtual_ip  (Started vqa12)
> May 05 22:12:53 [3517] vqa12    pengine:     info: LogActions:  Leave   
> vha-94b33532-15ba-4923-a920-ab9268ccd856:0      (Master vqa12)
> May 05 22:12:53 [3517] vqa12    pengine:   notice: LogActions:  Start   
> vha-94b33532-15ba-4923-a920-ab9268ccd856:1      (vqa11)
> May 05 22:12:53 [3517] vqa12    pengine:   notice: process_pe_message:  
> Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-99.bz2
> May 05 22:12:53 [3518] vqa12       crmd:     info: do_state_transition:       
>   State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> May 05 22:12:53 [3518] vqa12       crmd:     info: do_te_invoke:        
> Processing graph 7 (ref=pe_calc-dc-1367817173-63) derived from 
> /var/lib/pacemaker/pengine/pe-input-99.bz2
> May 05 22:12:53 [3518] vqa12       crmd:     info: te_rsc_command:      
> Initiating action 5: monitor vgc_virtual_ip_monitor_0 on vqa11
> May 05 22:12:53 [3518] vqa12       crmd:     info: te_rsc_command:      
> Initiating action 6: monitor 
> vha-94b33532-15ba-4923-a920-ab9268ccd856:1_monitor_0 on vqa11
> May 05 22:12:56 [3518] vqa12       crmd:     info: te_rsc_command:      
> Initiating action 4: probe_complete probe_complete on vqa11 - no waiting
> May 05 22:12:56 [3518] vqa12       crmd:     info: te_rsc_command:      
> Initiating action 14: start 
> vha-94b33532-15ba-4923-a920-ab9268ccd856:1_start_0 on vqa11
> May 05 22:12:56 [3518] vqa12       crmd:     info: te_rsc_command:      
> Initiating action 15: monitor 
> vha-94b33532-15ba-4923-a920-ab9268ccd856:1_monitor_31000 on vqa11
> May 05 22:12:56 [3518] vqa12       crmd:   notice: run_graph:   Transition 7 
> (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-99.bz2): Complete
> May 05 22:12:56 [3518] vqa12       crmd:   notice: do_state_transition:       
>   State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> May 05 22:14:50 [3518] vqa12       crmd:    error: 
> cib_quorum_update_complete:  Quorum update 135 failed
> May 05 22:14:50 [3518] vqa12       crmd:    error: do_log:      FSA: Input 
> I_ERROR from cib_quorum_update_complete() received in state S_IDLE
> May 05 22:14:50 [3518] vqa12       crmd:   notice: do_state_transition:       
>   State transition S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL 
> origin=cib_quorum_update_complete ]
> May 05 22:14:50 [3518] vqa12       crmd:    error: do_recover:  Action 
> A_RECOVER (0000000001000000) not supported
> May 05 22:14:50 [3518] vqa12       crmd:  warning: do_election_vote:    Not 
> voting in election, we're in state S_RECOVERY
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_dc_release:       DC 
> role released
> May 05 22:14:50 [3518] vqa12       crmd:     info: pe_ipc_destroy:      
> Connection to the Policy Engine released
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_te_control:       
> Transitioner is now inactive
> May 05 22:14:50 [3518] vqa12       crmd:    error: do_log:      FSA: Input 
> I_TERMINATE from do_recover() received in state S_RECOVERY
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_state_transition:       
>   State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE 
> cause=C_FSA_INTERNAL orig
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_shutdown:         
> Disconnecting STONITH...
> May 05 22:14:50 [3518] vqa12       crmd:     info: 
> tengine_stonith_connection_destroy:  Fencing daemon disconnected
> May 05 22:14:50 [3515] vqa12       lrmd:     info: cancel_recurring_action:   
>   Cancelling operation vha-94b33532-15ba-4923-a920-ab9268ccd856_monitor_30000
> May 05 22:14:50 [3518] vqa12       crmd:    error: verify_stopped:      
> Resource vgc_virtual_ip was active at shutdown.  You may ignore this error if 
> it is unmanaged.
> May 05 22:14:50 [3518] vqa12       crmd:    error: verify_stopped:      
> Resource vha-94b33532-15ba-4923-a920-ab9268ccd856 was active at shutdown.  
> You may ignore this error if it is unmanaged.
> May 05 22:14:50 [3518] vqa12       crmd:     info: lrmd_api_disconnect:       
>   Disconnecting from lrmd service
> May 05 22:14:50 [3515] vqa12       lrmd:     info: lrmd_ipc_destroy:    LRMD 
> client disconnecting 0x14bf9f0 - name: crmd id: 
> a7e581bd-c0fe-4d3f-9734-acca62e868a8
> May 05 22:14:50 [3518] vqa12       crmd:     info: lrmd_connection_destroy:   
>   connection destroyed
> May 05 22:14:50 [3518] vqa12       crmd:     info: lrm_connection_destroy:    
>   LRM Connection disconnected
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_lrm_control:      
> Disconnected from the LRM
> May 05 22:14:50 [3518] vqa12       crmd:     info: crm_cluster_disconnect:    
>   Disconnecting from cluster infrastructure: classic openais (with plugin)
> May 05 22:14:50 [3518] vqa12       crmd:   notice: terminate_cs_connection:   
>   Disconnecting from Corosync
> May 05 22:14:50 [3518] vqa12       crmd:     info: crm_cluster_disconnect:    
>   Disconnected from classic openais (with plugin)
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_ha_control:       
> Disconnected from the cluster
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_cib_control:      
> Disconnecting CIB
> May 05 22:14:50 corosync [pcmk  ] info: pcmk_ipc_exit: Client crmd 
> (conn=0x231dd00, async-conn=0x231dd00) left
> May 05 22:14:50 [3513] vqa12        cib:     info: cib_process_readwrite:     
>   We are now in R/O mode
> May 05 22:14:50 [3513] vqa12        cib:  warning: qb_ipcs_event_sendv:       
>   new_event_notification (3513-3518-15): Broken pipe (32)
> May 05 22:14:50 [3513] vqa12        cib:     info: crm_ipcs_send:       Event 
> 480 failed, size=162, to=0x2668ba0[3518], queue=1, retries=0, rc=-32: 
> <cib-reply t="cib" cib_op="cib_slave" cib_callid="156" 
> cib_clientid="4589219a-4c81-4a49-803c-df4cc8037f9a" cib_callopt="
> May 05 22:14:50 [3513] vqa12        cib:  warning: do_local_notify:     
> A-Sync reply to crmd failed: No message of desired type
> May 05 22:14:50 [3518] vqa12       crmd:     info: 
> crmd_cib_connection_destroy:         Connection to the CIB terminated...
> May 05 22:14:50 [3518] vqa12       crmd:     info: qb_ipcs_us_withdraw:       
>   withdrawing server sockets
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_exit:     Performing 
> A_EXIT_0 - gracefully exiting the CRMd
> May 05 22:14:50 [3518] vqa12       crmd:    error: do_exit:     Could not 
> recover from internal error
> May 05 22:14:50 [3518] vqa12       crmd:     info: do_exit:     [crmd] 
> stopped (2)
> May 05 22:14:50 [3518] vqa12       crmd:     info: crmd_exit:   Dropping 
> I_PENDING: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_election_vote ]
> May 05 22:14:50 [3518] vqa12       crmd:     info: crmd_exit:   Dropping 
> I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL 
> origin=do_dc_release ]
> May 05 22:14:50 [3518] vqa12       crmd:     info: crmd_exit:   Dropping 
> I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
> May 05 22:14:50 [3518] vqa12       crmd:     info: lrmd_api_disconnect:       
>   Disconnecting from lrmd service
> May 05 22:14:50 [3518] vqa12       crmd:     info: crm_xml_cleanup:     
> Cleaning up memory from libxml2
> May 05 22:14:50 [3507] vqa12 pacemakerd:    error: pcmk_child_exit:     Child 
> process crmd exited (pid=3518, rc=2)
> May 05 22:14:50 [3514] vqa12 stonith-ng:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3513] vqa12        cib:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3507] vqa12 pacemakerd:   notice: pcmk_process_exit:   
> Respawning failed child process: crmd
> May 05 22:14:50 [3507] vqa12 pacemakerd:     info: start_child:         
> Forked child 7806 for process crmd
> May 05 22:14:50 [3514] vqa12 stonith-ng:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3513] vqa12        cib:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 corosync [pcmk  ] WARN: route_ais_message: Sending message to 
> local.crmd failed: ipc delivery failed (rc=-2)
> May 05 22:14:50 [7806] vqa12       crmd:     info: crm_log_init:        
> Cannot change active directory to /var/lib/pacemaker/cores/hacluster: 
> Permission denied (13)
> May 05 22:14:50 [7806] vqa12       crmd:   notice: main:        CRM Git 
> Version: 394e906
> May 05 22:14:50 [7806] vqa12       crmd:     info: get_cluster_type:    
> Cluster type is: 'openais'
> 
> 
> 
> Corosync logs vqa11
> ===============
> 
> May 05 22:12:50 [3553] vqa11        cib:     info: cib_server_process_diff:   
>   Requesting re-sync from peer
> May 05 22:12:50 [3553] vqa11        cib:   notice: cib_server_process_diff:   
>   Not applying diff 0.7126.40 -> 0.7126.41 (sync in progress)
> May 05 22:12:50 corosync [CPG   ] chosen downlist: sender r(0) 
> ip(192.168.1.1) ; members(old:1 left:0)
> May 05 22:12:50 corosync [MAIN  ] Completed service synchronization, ready to 
> provide service.
> May 05 22:12:50 [3553] vqa11        cib:     info: cib_process_replace:       
>   Digest matched on replace from vqa12: dfa194fcf61b3e86b6a79b2506c41a1c
> May 05 22:12:50 [3553] vqa11        cib:     info: cib_process_replace:       
>   Replaced 0.7126.1 with 0.7126.42 from vqa12
> May 05 22:12:50 [3553] vqa11        cib:     info: cib_replace_notify:  
> Replaced: 0.7126.1 -> 0.7126.42 from vqa12
> May 05 22:12:51 [3554] vqa11 stonith-ng:     info: stonith_command:     
> Processed register from crmd.3558: OK (0)
> May 05 22:12:51 [3554] vqa11 stonith-ng:     info: stonith_command:     
> Processed st_notify from crmd.3558: OK (0)
> May 05 22:12:51 [3554] vqa11 stonith-ng:     info: stonith_command:     
> Processed st_notify from crmd.3558: OK (0)
> May 05 22:12:51 [3558] vqa11       crmd:     info: ais_dispatch_message:      
>   Membership 4120: quorum still lost
> May 05 22:12:51 [3558] vqa11       crmd:     info: crm_get_peer:        Node 
> <null> now has id: 33663168
> May 05 22:12:51 [3558] vqa11       crmd:   notice: crm_update_peer_state:     
>   crm_update_ais_node: Node (null)[33663168] - state is now member
> May 05 22:12:51 [3558] vqa11       crmd:     info: crm_update_peer:     
> crm_update_ais_node: Node (null): id=33663168 state=member addr=r(0) 
> ip(192.168.1.2)  (new) votes=0 born=0 seen=4120 
> proc=00000000000000000000000000000000
> May 05 22:12:51 [3558] vqa11       crmd:   notice: ais_dispatch_message:      
>   Membership 4120: quorum acquired
> May 05 22:12:51 [3558] vqa11       crmd:     info: crm_get_peer:        Node 
> 33663168 is now known as vqa12
> May 05 22:12:51 [3558] vqa11       crmd:     info: peer_update_callback:      
>   vqa12 is now member
> May 05 22:12:51 [3558] vqa11       crmd:     info: crm_get_peer:        Node 
> 33663168 has uuid vqa12
> May 05 22:12:51 [3558] vqa11       crmd:     info: crm_update_peer:     
> crm_update_ais_node: Node vqa12: id=33663168 state=member addr=r(0) 
> ip(192.168.1.2)  votes=1 (new) born=4092 seen=4120 
> proc=00000000000000000000000000000000
> May 05 22:12:51 [3558] vqa11       crmd:    error: crmd_ais_dispatch:   
> Recieving messages from a node we think is dead: vqa12[33663168]
> May 05 22:12:51 [3558] vqa11       crmd:     info: crm_update_peer_proc:      
>   crmd_ais_dispatch: Node vqa12[33663168] - ais is now online
> May 05 22:12:51 [3558] vqa11       crmd:     info: peer_update_callback:      
>   Client vqa12/peer now has status [offline] (DC=<null>)
> May 05 22:12:51 [3547] vqa11 pacemakerd:   notice: update_node_processes:     
>   0x13685e0 Node 33663168 now known as vqa12, was:
> May 05 22:12:51 [3558] vqa11       crmd:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:12:51 [3558] vqa11       crmd:     info: peer_update_callback:      
>   Client vqa12/peer now has status [online] (DC=<null>)
> May 05 22:12:51 [3554] vqa11 stonith-ng:     info: crm_get_peer:        Node 
> vqa12 now has id: 33663168
> May 05 22:12:51 [3554] vqa11 stonith-ng:     info: crm_get_peer:        Node 
> 33663168 is now known as vqa12
> May 05 22:12:51 [3554] vqa11 stonith-ng:     info: crm_get_peer:        Node 
> 33663168 has uuid vqa12
> May 05 22:12:51 [3554] vqa11 stonith-ng:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:12:51 [3558] vqa11       crmd:     info: update_dc:   Set DC to 
> vqa12 (3.0.7)
> May 05 22:12:51 [3553] vqa11        cib:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:12:51 [3558] vqa11       crmd:     info: erase_status_tag:    
> Deleting xpath: //node_state[@uname='vqa11']/transient_attributes
> May 05 22:12:51 [3558] vqa11       crmd:     info: update_attrd:        
> Connecting to attrd... 5 retries remaining
> May 05 22:12:51 [3558] vqa11       crmd:   notice: do_state_transition:       
>   State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE 
> origin=do_cl_join_finalize_respond ]
> May 05 22:12:51 [3556] vqa11      attrd:   notice: attrd_local_callback:      
>   Sending full refresh (origin=crmd)
> May 05 22:12:51 [3553] vqa11        cib:     info: cib_process_replace:       
>   Digest matched on replace from vqa12: 8e663b021f35b1f922b50f7404cb7032
> May 05 22:12:51 [3553] vqa11        cib:     info: cib_process_replace:       
>   Replaced 0.7126.47 with 0.7126.47 from vqa12
> May 05 22:12:53 [3555] vqa11       lrmd:     info: process_lrmd_get_rsc_info: 
>   Resource 'vgc_virtual_ip' not found (0 active resources)
> May 05 22:12:53 [3555] vqa11       lrmd:     info: process_lrmd_rsc_register: 
>   Added 'vgc_virtual_ip' to the rsc list (1 active resources)
> May 05 22:12:53 [3555] vqa11       lrmd:     info: process_lrmd_get_rsc_info: 
>   Resource 'vha-94b33532-15ba-4923-a920-ab9268ccd856' not found (1 active 
> resources)
> May 05 22:12:53 [3555] vqa11       lrmd:     info: process_lrmd_get_rsc_info: 
>   Resource 'vha-94b33532-15ba-4923-a920-ab9268ccd856:1' not found (1 active 
> resources)
> May 05 22:12:53 [3555] vqa11       lrmd:     info: process_lrmd_rsc_register: 
>   Added 'vha-94b33532-15ba-4923-a920-ab9268ccd856' to the rsc list (2 active 
> resources)
> May 05 22:12:53 [3555] vqa11       lrmd:   notice: operation_finished:  
> vgc_virtual_ip_monitor_0:3616 [ Converted dotted-quad netmask to CIDR as: 22 ]
> May 05 22:12:54 [3558] vqa11       crmd:     info: 
> services_os_action_execute:  Managed vgc-cm-agent.ocf_meta-data_0 process 
> 3633 exited with rc=0
> May 05 22:12:54 [3558] vqa11       crmd:   notice: process_lrm_event:   LRM 
> operation vha-94b33532-15ba-4923-a920-ab9268ccd856_monitor_0 (call=10, rc=7, 
> cib-update=8, confirmed=true) not running
> May 05 22:12:55 [3558] vqa11       crmd:     info: 
> services_os_action_execute:  Managed IPaddr2_meta-data_0 process 3659 exited 
> with rc=0
> May 05 22:12:55 [3558] vqa11       crmd:   notice: process_lrm_event:   LRM 
> operation vgc_virtual_ip_monitor_0 (call=5, rc=7, cib-update=9, 
> confirmed=true) not running
> May 05 22:12:55 [3556] vqa11      attrd:   notice: attrd_trigger_update:      
>   Sending flush op to all hosts for: probe_complete (true)
> May 05 22:12:55 [3556] vqa11      attrd:   notice: attrd_perform_update:      
>   Sent update 8: probe_complete=true
> May 05 22:12:55 [3558] vqa11       crmd:   notice: process_lrm_event:   LRM 
> operation vha-94b33532-15ba-4923-a920-ab9268ccd856_start_0 (call=14, rc=0, 
> cib-update=10, confirmed=true) ok
> May 05 22:12:56 [3558] vqa11       crmd:   notice: process_lrm_event:   LRM 
> operation vha-94b33532-15ba-4923-a920-ab9268ccd856_monitor_31000 (call=17, 
> rc=0, cib-update=11, confirmed=false) ok
> May 05 22:14:50 [3553] vqa11        cib:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3558] vqa11       crmd:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3558] vqa11       crmd:     info: peer_update_callback:      
>   Client vqa12/peer now has status [offline] (DC=vqa12)
> May 05 22:14:50 [3554] vqa11 stonith-ng:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3558] vqa11       crmd:   notice: peer_update_callback:      
>   Got client status callback - our DC is dead
> May 05 22:14:50 [3558] vqa11       crmd:   notice: do_state_transition:       
>   State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION 
> cause=C_CRMD_STATUS_CALLBACK origin=peer_update_callback ]
> May 05 22:14:50 [3558] vqa11       crmd:   notice: do_state_transition:       
>   State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
> cause=C_FSA_INTERNAL origin=do_election_check ]
> May 05 22:14:50 [3558] vqa11       crmd:     info: do_te_control:       
> Registering TE UUID: 8e6b7382-90eb-4ac5-80bf-d2feac7e0d7e
> May 05 22:14:50 [3554] vqa11 stonith-ng:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3553] vqa11        cib:     info: crm_update_peer_proc:      
>   pcmk_mcp_dispatch: Node vqa12[33663168] - unknown is now (null)
> May 05 22:14:50 [3558] vqa11       crmd:     info: set_graph_functions:       
>   Setting custom graph functions
> May 05 22:14:50 [3558] vqa11       crmd:     info: do_dc_takeover:      
> Taking over DC status for this partition
> May 05 22:14:50 [3553] vqa11        cib:     info: cib_process_readwrite:     
>   We are now in R/W mode
> May 05 22:14:50 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_master for section 'all' (origin=local/crmd/12, 
> version=0.7126.67): OK (rc=0)
> May 05 22:14:50 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_modify for section cib (origin=local/crmd/13, 
> version=0.7126.68): OK (rc=0)
> May 05 22:14:50 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_modify for section crm_config 
> (origin=local/crmd/15, version=0.7126.69): OK (rc=0)
> May 05 22:14:50 [3558] vqa11       crmd:     info: join_make_offer:     
> Making join offers based on membership 4120
> May 05 22:14:50 [3558] vqa11       crmd:     info: do_dc_join_offer_all:      
>   join-1: Waiting on 1 outstanding join acks
> May 05 22:14:50 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_modify for section crm_config 
> (origin=local/crmd/17, version=0.7126.70): OK (rc=0)
> May 05 22:14:50 [3558] vqa11       crmd:     info: crm_update_peer_expected:  
>   do_dc_join_filter_offer: Node vqa11[16885952] - expected state is now member
> May 05 22:14:51 [3558] vqa11       crmd:     info: do_dc_join_offer_all:      
>   A new node joined the cluster
> May 05 22:14:51 [3558] vqa11       crmd:     info: do_dc_join_offer_all:      
>   join-3: Waiting on 2 outstanding join acks
> May 05 22:14:51 [3558] vqa11       crmd:     info: update_dc:   Set DC to 
> vqa11 (3.0.7)
> May 05 22:14:52 [3558] vqa11       crmd:     info: crm_update_peer_expected:  
>   do_dc_join_filter_offer: Node vqa12[33663168] - expected state is now member
> May 05 22:14:52 [3558] vqa11       crmd:     info: do_state_transition:       
>   State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
> cause=C_FSA_INTERNAL origin=check_join_state ]
> May 05 22:14:52 [3558] vqa11       crmd:     info: do_dc_join_finalize:       
>   join-3: Syncing the CIB from vqa11 to the rest of the cluster
> May 05 22:14:52 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_sync for section 'all' (origin=local/crmd/27, 
> version=0.7126.73): OK (rc=0)
> May 05 22:14:52 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_modify for section nodes (origin=local/crmd/28, 
> version=0.7126.74): OK (rc=0)
> May 05 22:14:52 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_modify for section nodes (origin=local/crmd/29, 
> version=0.7126.75): OK (rc=0)
> May 05 22:14:52 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_delete for section 
> //node_state[@uname='vqa12']/transient_attributes (origin=vqa12/crmd/8, 
> version=0.7126.76): OK (rc=0)
> May 05 22:14:52 [3553] vqa11        cib:  warning: cib_process_request:       
>   Operation complete: op cib_modify for section status 
> (origin=vqa12/attrd/61, version=0.7126.76): No such device or address (rc=-6)
> May 05 22:14:53 [3558] vqa11       crmd:     info: 
> services_os_action_execute:  Managed vgc-cm-agent.ocf_meta-data_0 process 
> 4015 exited with rc=0
> May 05 22:14:53 [3558] vqa11       crmd:     info: do_dc_join_ack:      
> join-3: Updating node state to member for vqa12
> May 05 22:14:53 [3558] vqa11       crmd:     info: erase_status_tag:    
> Deleting xpath: //node_state[@uname='vqa12']/lrm
> May 05 22:14:53 [3558] vqa11       crmd:     info: do_dc_join_ack:      
> join-3: Updating node state to member for vqa11
> May 05 22:14:53 [3558] vqa11       crmd:     info: erase_status_tag:    
> Deleting xpath: //node_state[@uname='vqa11']/lrm
> May 05 22:14:53 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_delete for section 
> //node_state[@uname='vqa12']/lrm (origin=local/crmd/30, version=0.7126.79): 
> OK (rc=0)
> May 05 22:14:53 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_delete for section 
> //node_state[@uname='vqa11']/lrm (origin=local/crmd/32, version=0.7126.81): 
> OK (rc=0)
> May 05 22:14:53 [3558] vqa11       crmd:     info: do_state_transition:       
>   State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED 
> cause=C_FSA_INTERNAL origin=check_join_state ]
> May 05 22:14:53 [3556] vqa11      attrd:   notice: attrd_local_callback:      
>   Sending full refresh (origin=crmd)
> May 05 22:14:53 [3556] vqa11      attrd:   notice: attrd_trigger_update:      
>   Sending flush op to all hosts for: probe_complete (true)
> May 05 22:14:53 [3558] vqa11       crmd:     info: abort_transition_graph:    
>   do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled
> May 05 22:14:53 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_modify for section nodes (origin=local/crmd/34, 
> version=0.7126.83): OK (rc=0)
> May 05 22:14:53 [3553] vqa11        cib:     info: cib_process_request:       
>   Operation complete: op cib_modify for section cib (origin=local/crmd/36, 
> version=0.7126.85): OK (rc=0)
> 
> 
> 
> [root@vqa12 bug17873]# rpm -qa | grep pacemaker
> pacemaker-cluster-libs-1.1.8-7.el6.x86_64
> pacemaker-cli-1.1.8-7.el6.x86_64
> pacemaker-1.1.8-7.el6.x86_64
> pacemaker-libs-1.1.8-7.el6.x86_64
> [root@vqa12 bug17873]# rpm -qa | grep corosync
> corosync-1.4.1-15.el6.x86_64
> corosynclib-1.4.1-15.el6.x86_64
> [root@vqa12 bug17873]# cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 6.2 (Santiago)
> 
> 
> Regards,
>  Kiran
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] CRMd exits because of internal error

Reply via email to