On 10/05/2013, at 12:40 AM, Steven Bambling <smbambl...@arin.net> wrote:
> I'm having some issues with getting some cluster monitoring setup and > configured on a 3 node multi-state cluster. I'm using Florian's blog as an > example > http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/. > > When I create the primitive resource it starts on one of my nodes but spawns > multiple instances of crm_mon. I don't see any reason that would cause it to > spawn multiple instances, its very odd behavior. If you run: /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h /tmp/ClusterMon_SNMPMon.html manually a few times, what happens? Multiple processes? > > I was also looking for some clarification on what this resource provides….it > looks to me that it kicks off a crm_mon in daemon mode that will update a > .html file and with -E it will run an external script. But the resource > itself doesn't trigger anything if another resource changes state only if the > crm_mon process ( monitored with PID ) fails and it has to restart. Correct, it just updates the html file which you can see in your browser. Or, with -E, it can send an email or snmp alert. > If this is correct what is the best practice for monitoring additional > resource states? Define "additional"? If the resource fails we'll normally recover it automatically. > > v/r > > STEVE > > > Below are some additional data points. > > > Creating the Resource > > [root@pgdb2 tmp]# crm configure primitive SNMPMon ocf:pacemaker:ClusterMon \ > > params user="root" update="30" extra_options="-E > > /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net" \ > > op monitor on-fail="restart" interval="60" > > > Manual crm_mon output > > Last updated: Thu May 9 10:24:30 2013 > Last change: Thu May 9 10:20:49 2013 via cibadmin on pgdb2.example.com > Stack: cman > Current DC: pgdb1.example.com - partition with quorum > Version: 1.1.8-7.el6-394e906 > 3 Nodes configured, unknown expected votes > 6 Resources configured. > > > Node pgdb1.example.com: standby > Online: [ pgdb2.example.com pgdb3.example.com ] > > PG_REP_VIP (ocf::heartbeat:IPaddr2): Started pgdb2.example.com > PG_CLI_VIP (ocf::heartbeat:IPaddr2): Started pgdb2.example.com > Master/Slave Set: msPGSQL [PGSQL] > Masters: [ pgdb2.example.com ] > Slaves: [ pgdb3.example.com ] > Stopped: [ PGSQL:2 ] > SNMPMon (ocf::pacemaker:ClusterMon): Started pgdb3.example.com > > PS to check for process on pgdb3 > > [root@pgdb3 tmp]# ps aux | grep crm_mon > root 16097 0.0 0.0 82624 2784 ? S 10:20 0:00 > /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E > /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h > /tmp/ClusterMon_SNMPMon.html > root 16099 0.0 0.0 82624 2660 ? S 10:20 0:00 > /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E > /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h > /tmp/ClusterMon_SNMPMon.html > root 16104 0.0 0.0 82624 2448 ? S 10:20 0:00 > /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E > /usr/local/bin/pcmk_snmp_helper.sh -e zen.arin.net -h > /tmp/ClusterMon_SNMPMon.html > root 16515 0.0 0.0 103244 852 pts/0 S+ 10:21 0:00 grep crm_mon > > Output from corosync.log > > May 09 10:20:51 [3100] pgdb3.cha.arin.net lrmd: info: > process_lrmd_get_rsc_info: Resource 'SNMPMon' not found (3 active > resources) > May 09 10:20:51 [3100] pgdb3.cha.arin.net lrmd: info: > process_lrmd_rsc_register: Added 'SNMPMon' to the rsc list (4 active > resources) > May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: info: > services_os_action_execute: Managed ClusterMon_meta-data_0 process 16010 > exited with rc=0 > May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: notice: > process_lrm_event: LRM operation SNMPMon_monitor_0 (call=61, rc=7, > cib-update=28, confirmed=true) not running > May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: notice: > process_lrm_event: LRM operation SNMPMon_start_0 (call=64, rc=0, > cib-update=29, confirmed=true) ok > May 09 10:20:52 [3103] pgdb3.cha.arin.net crmd: notice: > process_lrm_event: LRM operation SNMPMon_monitor_60000 (call=67, rc=0, > cib-update=30, confirmed=false) ok > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org