>> Is this an error or the desired result? >Based on the logs, pacemaker thinks corosync died. Did that happen? >If so there is not much pacemaker can do :-(
And that is absolutely ok when corosync dies. Corosync does not die but is still healthy. It is corosync-notifyd which is started additionally to corosync as a separate process and which is finished with kill as daemon or with ctrl-c as foreground process. The job of corosync-notifyd is sending of SNMP traps. This is the functionality of crm_mon -C .. -S ... for pacemaker. So corosync-notifyd sends the wrong signal or pacemaker does a little bit too much. Pacemaker should just ignore this ending connection. Is there a chance in pacemaker or should should this better solved in corosync/corosync-notifyd? Andreas -----Ursprüngliche Nachricht----- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Mittwoch, 3. Oktober 2012 01:09 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Exiting corosync-notifyd results in shutting downof pacemakerd On Wed, Oct 3, 2012 at 2:51 AM, Grüninger, Andreas (LGL Extern) <andreas.gruenin...@lgl.bwl.de> wrote: > I am currently investigating the monitoring of corosync/pacemaker with snmp. > crm_mon used with the OCF resource ClusterMon works as it should. > > But corosync-notifyd can't be used in our case. > I start corosync-notifyd in the foreground as follows corosync-notifyd > -f -l -s -m 10.50.235.1 > > When I stop the running corosync-notifyd with CTRL-C, pacemaker shuts down > with the following entries in the logfile. > Is this an error or the desired result? Based on the logs, pacemaker thinks corosync died. Did that happen? If so there is not much pacemaker can do :-( > > .... > Oct 02 18:42:19 [27126] pacemakerd: error: cfg_connection_destroy: > Connection destroyed > Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_shutdown_worker: > Shuting down Pacemaker > Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping > crmd: Sent -15 to process 27177 > Oct 02 18:42:19 [27126] pacemakerd: error: cpg_connection_destroy: > Connection destroyed > Oct 02 18:42:19 [27177] crmd: info: crm_signal_dispatch: > Invoking handler for signal 15: Terminated > Oct 02 18:42:19 [27177] crmd: notice: crm_shutdown: Requesting > shutdown, upper limit is 1200000ms > Oct 02 18:42:19 [27128] stonith-ng: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: 2 > Oct 02 18:42:19 [27177] crmd: info: do_shutdown_req: Sending > shutdown request to zd-sol-s1-v61 > Oct 02 18:42:19 [27128] stonith-ng: error: stonith_peer_ais_destroy: > AIS connection terminated > Oct 02 18:42:19 [27128] stonith-ng: info: stonith_shutdown: > Terminating with 1 clients > Oct 02 18:42:19 [27130] attrd: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: 2 > Oct 02 18:42:19 [27130] attrd: crit: attrd_ais_destroy: Lost > connection to Corosync service! > Oct 02 18:42:19 [27130] attrd: notice: main: Exiting... > Oct 02 18:42:19 [27130] attrd: notice: main: Disconnecting client > 81ffc38, pid=27177... > Oct 02 18:42:19 [27128] stonith-ng: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Oct 02 18:42:19 [27128] stonith-ng: info: crm_xml_cleanup: Cleaning up > memory from libxml2 > Oct 02 18:42:19 [27130] attrd: error: attrd_cib_connection_destroy: > Connection to the CIB terminated... > Oct 02 18:42:19 [27127] cib: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: 2 > Oct 02 18:42:19 [27127] cib: error: cib_ais_destroy: Corosync > connection lost! Exiting. > Oct 02 18:42:19 [27129] lrmd: info: lrmd_ipc_destroy: LRMD > client disconnecting 807e768 - name: crmd id: > 1d659f61-d6e2-4ef3-f674-b9a8ba8029e8 > Oct 02 18:42:19 [27127] cib: info: terminate_cib: > cib_ais_destroy: Exiting fast... > Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Oct 02 18:42:19 [27127] cib: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Oct 02 18:42:19 [27126] pacemakerd: error: pcmk_child_exit: Child process > attrd exited (pid=27130, rc=1) > Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: > Sending message via cpg FAILED: (rc=9) Bad handle > Oct 02 18:42:19 [27126] pacemakerd: error: pcmk_child_exit: Child process > cib exited (pid=27127, rc=64) > Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: > Sending message via cpg FAILED: (rc=9) Bad handle > Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_child_exit: Child process > crmd terminated with signal 13 (pid=27177, core=0) > Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: > Sending message via cpg FAILED: (rc=9) Bad handle > Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping > pengine: Sent -15 to process 27131 > Oct 02 18:42:19 [27126] pacemakerd: info: pcmk_child_exit: Child process > pengine exited (pid=27131, rc=0) > Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: > Sending message via cpg FAILED: (rc=9) Bad handle > Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping > lrmd: Sent -15 to process 27129 > Oct 02 18:42:19 [27129] lrmd: info: crm_signal_dispatch: > Invoking handler for signal 15: Terminated > Oct 02 18:42:19 [27129] lrmd: info: lrmd_shutdown: Terminating > with 0 clients > Oct 02 18:42:19 [27129] lrmd: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Oct 02 18:42:19 [27126] pacemakerd: info: pcmk_child_exit: Child process > lrmd exited (pid=27129, rc=0) > Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: > Sending message via cpg FAILED: (rc=9) Bad handle > Oct 02 18:42:19 [27126] pacemakerd: notice: stop_child: Stopping > stonith-ng: Sent -15 to process 27128 > Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_child_exit: Child process > stonith-ng terminated with signal 11 (pid=27128, core=128) > Oct 02 18:42:19 [27126] pacemakerd: error: send_cpg_message: > Sending message via cpg FAILED: (rc=9) Bad handle > Oct 02 18:42:19 [27126] pacemakerd: notice: pcmk_shutdown_worker: > Shutdown complete > Oct 02 18:42:19 [27126] pacemakerd: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Oct 02 18:42:19 [27126] pacemakerd: info: main: Exiting pacemakerd > > Andreas > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org