Re: [Pacemaker] Exiting corosync-notifyd results in shutting downof pacemakerd

Andrew Beekhof Wed, 10 Oct 2012 17:10:38 -0700

On Thu, Oct 4, 2012 at 5:57 PM, Grüninger, Andreas (LGL Extern)
<andreas.gruenin...@lgl.bwl.de> wrote:
>>> Is this an error or the desired result?
>
>>Based on the logs, pacemaker thinks corosync died.  Did that happen?
>>If so there is not much pacemaker can do :-(
>
> And that is absolutely ok when corosync dies.
> Corosync does not die but is still healthy.
> It is corosync-notifyd which is started additionally to corosync as a 
> separate process and which is finished with kill as daemon or with ctrl-c as 
> foreground process.
> The job of corosync-notifyd is sending of SNMP traps.
> This is the functionality of crm_mon -C .. -S ... for pacemaker.
>
> So corosync-notifyd sends the wrong signal or pacemaker does a little bit too 
> much.
> Pacemaker should just ignore this ending connection.


All the Pacemaker daemons are being told, by Corosync itself, that
their connections to Corosync are dead.
Its a little difficult to ignore that.

> Is there a chance in pacemaker or should should this better solved in 
> corosync/corosync-notifyd?

It needs to be addressed in corosync/corosync-notifyd.
Corosync's CPG library is the one invoking our
cpg_connection_destroy() callback.

>
> Andreas
>
> -----Ursprüngliche Nachricht-----
> Von: Andrew Beekhof [mailto:and...@beekhof.net]
> Gesendet: Mittwoch, 3. Oktober 2012 01:09
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Exiting corosync-notifyd results in shutting downof 
> pacemakerd
>
> On Wed, Oct 3, 2012 at 2:51 AM, Grüninger, Andreas (LGL Extern) 
> <andreas.gruenin...@lgl.bwl.de> wrote:
>> I am currently investigating the monitoring of corosync/pacemaker with snmp.
>> crm_mon used with the OCF resource ClusterMon works as it should.
>>
>> But corosync-notifyd can't be used in our case.
>> I start corosync-notifyd in the foreground as follows corosync-notifyd
>> -f -l -s  -m 10.50.235.1
>>
>> When I stop the running corosync-notifyd with CTRL-C, pacemaker shuts down 
>> with the following entries in the logfile.
>> Is this an error or the desired result?
>
> Based on the logs, pacemaker thinks corosync died.  Did that happen?
> If so there is not much pacemaker can do :-(
>
>>
>> ....
>> Oct 02 18:42:19 [27126] pacemakerd:    error: cfg_connection_destroy:   
>> Connection destroyed
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: pcmk_shutdown_worker:     
>> Shuting down Pacemaker
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: stop_child:       Stopping 
>> crmd: Sent -15 to process 27177
>> Oct 02 18:42:19 [27126] pacemakerd:    error: cpg_connection_destroy:   
>> Connection destroyed
>> Oct 02 18:42:19 [27177]       crmd:     info: crm_signal_dispatch:      
>> Invoking handler for signal 15: Terminated
>> Oct 02 18:42:19 [27177]       crmd:   notice: crm_shutdown:     Requesting 
>> shutdown, upper limit is 1200000ms
>> Oct 02 18:42:19 [27128] stonith-ng:    error: pcmk_cpg_dispatch:        
>> Connection to the CPG API failed: 2
>> Oct 02 18:42:19 [27177]       crmd:     info: do_shutdown_req:  Sending 
>> shutdown request to zd-sol-s1-v61
>> Oct 02 18:42:19 [27128] stonith-ng:    error: stonith_peer_ais_destroy:      
>>    AIS connection terminated
>> Oct 02 18:42:19 [27128] stonith-ng:     info: stonith_shutdown:         
>> Terminating with  1 clients
>> Oct 02 18:42:19 [27130]      attrd:    error: pcmk_cpg_dispatch:        
>> Connection to the CPG API failed: 2
>> Oct 02 18:42:19 [27130]      attrd:     crit: attrd_ais_destroy:        Lost 
>> connection to Corosync service!
>> Oct 02 18:42:19 [27130]      attrd:   notice: main:     Exiting...
>> Oct 02 18:42:19 [27130]      attrd:   notice: main:     Disconnecting client 
>> 81ffc38, pid=27177...
>> Oct 02 18:42:19 [27128] stonith-ng:     info: qb_ipcs_us_withdraw:      
>> withdrawing server sockets
>> Oct 02 18:42:19 [27128] stonith-ng:     info: crm_xml_cleanup:  Cleaning up 
>> memory from libxml2
>> Oct 02 18:42:19 [27130]      attrd:    error: attrd_cib_connection_destroy:  
>>    Connection to the CIB terminated...
>> Oct 02 18:42:19 [27127]        cib:    error: pcmk_cpg_dispatch:        
>> Connection to the CPG API failed: 2
>> Oct 02 18:42:19 [27127]        cib:    error: cib_ais_destroy:  Corosync 
>> connection lost!  Exiting.
>> Oct 02 18:42:19 [27129]       lrmd:     info: lrmd_ipc_destroy:         LRMD 
>> client disconnecting 807e768 - name: crmd id: 
>> 1d659f61-d6e2-4ef3-f674-b9a8ba8029e8
>> Oct 02 18:42:19 [27127]        cib:     info: terminate_cib:    
>> cib_ais_destroy: Exiting fast...
>> Oct 02 18:42:19 [27127]        cib:     info: qb_ipcs_us_withdraw:      
>> withdrawing server sockets
>> Oct 02 18:42:19 [27127]        cib:     info: qb_ipcs_us_withdraw:      
>> withdrawing server sockets
>> Oct 02 18:42:19 [27127]        cib:     info: qb_ipcs_us_withdraw:      
>> withdrawing server sockets
>> Oct 02 18:42:19 [27126] pacemakerd:    error: pcmk_child_exit:  Child 
>> process attrd exited (pid=27130, rc=1)
>> Oct 02 18:42:19 [27126] pacemakerd:    error: send_cpg_message:         
>> Sending message via cpg FAILED: (rc=9) Bad handle
>> Oct 02 18:42:19 [27126] pacemakerd:    error: pcmk_child_exit:  Child 
>> process cib exited (pid=27127, rc=64)
>> Oct 02 18:42:19 [27126] pacemakerd:    error: send_cpg_message:         
>> Sending message via cpg FAILED: (rc=9) Bad handle
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: pcmk_child_exit:  Child 
>> process crmd terminated with signal 13 (pid=27177, core=0)
>> Oct 02 18:42:19 [27126] pacemakerd:    error: send_cpg_message:         
>> Sending message via cpg FAILED: (rc=9) Bad handle
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: stop_child:       Stopping 
>> pengine: Sent -15 to process 27131
>> Oct 02 18:42:19 [27126] pacemakerd:     info: pcmk_child_exit:  Child 
>> process pengine exited (pid=27131, rc=0)
>> Oct 02 18:42:19 [27126] pacemakerd:    error: send_cpg_message:         
>> Sending message via cpg FAILED: (rc=9) Bad handle
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: stop_child:       Stopping 
>> lrmd: Sent -15 to process 27129
>> Oct 02 18:42:19 [27129]       lrmd:     info: crm_signal_dispatch:      
>> Invoking handler for signal 15: Terminated
>> Oct 02 18:42:19 [27129]       lrmd:     info: lrmd_shutdown:    Terminating 
>> with  0 clients
>> Oct 02 18:42:19 [27129]       lrmd:     info: qb_ipcs_us_withdraw:      
>> withdrawing server sockets
>> Oct 02 18:42:19 [27126] pacemakerd:     info: pcmk_child_exit:  Child 
>> process lrmd exited (pid=27129, rc=0)
>> Oct 02 18:42:19 [27126] pacemakerd:    error: send_cpg_message:         
>> Sending message via cpg FAILED: (rc=9) Bad handle
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: stop_child:       Stopping 
>> stonith-ng: Sent -15 to process 27128
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: pcmk_child_exit:  Child 
>> process stonith-ng terminated with signal 11 (pid=27128, core=128)
>> Oct 02 18:42:19 [27126] pacemakerd:    error: send_cpg_message:         
>> Sending message via cpg FAILED: (rc=9) Bad handle
>> Oct 02 18:42:19 [27126] pacemakerd:   notice: pcmk_shutdown_worker:     
>> Shutdown complete
>> Oct 02 18:42:19 [27126] pacemakerd:     info: qb_ipcs_us_withdraw:      
>> withdrawing server sockets
>> Oct 02 18:42:19 [27126] pacemakerd:     info: main:     Exiting pacemakerd
>>
>> Andreas
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Exiting corosync-notifyd results in shutting downof pacemakerd

Reply via email to