On Thu, Jun 24, 2010 at 1:54 PM, Koch, Sebastian <sebastian.k...@netzwerk.de> wrote: > Hi, > > thanks for your reply. It wasn't clear to me that pacemaker is issuing status > commands in the background even on the passive node.
We run a single monitor op for each resource on each node when it joins the cluster. This is the only way to be sure what the current state of the resource is. > The problem was, that on the passive node the symlinks tot he nagios > configuration were broken cause the drbd was mounted on the other node. > Therefore i just copied all needed configs to my /mnt/cluster/ dir and if the > node is passive it can use the configs from there. If it gets active the drbd > will be mounted on /mnt/cluster. > > Do you have a better idea Not really. Unless you want to relax the checks in the RA. > because it seems to me like a from-back-through-the-eye-into-the-chest > solution and i would like to solve it in a more elegant way. Currently i have > the same issue with ClusterMonitor because ist trying to write the html to > /var/www but i symlinked this tot he cluster dir and therefore the status > command fails on the passive node. > > Thanks in advance. > > Sebastian Koch > > > NETZWERK GmbH > > Fon: +49.711.220 5498 81 > Achtung neue Mobilfunknummer: +49.1522.299 6524 > Fax: +49.711.220 5499 77 > Email: sebastian.k...@netzwerk.de > Web: www.netzwerk.de > NETZWERK GmbH, Kurze Str. 40, 70794 Filderstadt-Bonlanden > Geschäftsführer: Siegfried Herner, Hans-Baldung Luley, Olaf Müller-Haberland > Sitz der Gesellschaft: Filderstadt-Bonlanden, Amtsgericht Stuttgart HRB > 225547, WEEE-Reg Nr. DE 185 622 492 > > -----Ursprüngliche Nachricht----- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Donnerstag, 24. Juni 2010 09:19 > An: The Pacemaker cluster resource manager > Betreff: Re: [Pacemaker] rsc_op: Hard error - res_Nagios_monitor_0 failedwith > rc=6: Preventing res_Nagios from re-starting anywhere inthe cluster > > On Wed, Jun 23, 2010 at 5:19 PM, Koch, Sebastian > <sebastian.k...@netzwerk.de> wrote: >> Hi, >> >> >> >> i got a 2 Node Cluster up and running and right know i am trying to >> configure a Nagios3 Resource. Therefore i already fixed the nagios init >> script as it dind't pass the LSB Compatibility Checks as described here: >> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html >> >> >> >> I just needed to make sure the pid file gets removed if the stop function is >> called. After this small change i passed all the LSB Checks. Below you find >> the error message: >> >> >> >> r...@pilot01-node2:/var/run/nagios3# crm_verify -LV >> >> crm_verify[7094]: 2010/06/23_16:37:27 ERROR: unpack_rsc_op: Hard error - >> res_Nagios_monitor_0 failed with rc=6: Preventing res_Nagios from >> re-starting anywhere in the cluster > > Looks like its still failing the fifth LSB check from the above url. > "Did the command print result: 3" > >> >> crm_verify[7094]: 2010/06/23_16:37:27 WARN: native_color: Resource >> res_Nagios cannot run anywhere >> >> Warnings found during check: config may not be valid >> >> >> >> I tried to find out what the init scripts must provide for allowing it to >> use it in pacemaker but i just found the LSB Compatib. Hints on the >> pacemaker website. I think i configured the primitive wrong or maybe the >> init script is still wrong? Even if i configure it with a op monitor action >> it fails. And even a crm resource cleanup res_Nagios doesn't help me >> starting the resource. >> >> >> >> I can run Nagios manually on the active node. I linked all shared >> directories to my cluster storage device like this: >> >> >> >> r...@pilot01-node2:/etc# ll /var/lib/nagios3* /etc/nagios* >> >> lrwxrwxrwx 1 root root 25 23. Jun 13:54 /etc/nagios3 -> >> /mnt/cluster/etc/nagios3/ >> >> lrwxrwxrwx 1 root root 29 23. Jun 14:04 /var/lib/nagios3 -> >> /mnt/cluster/var/lib/nagios3/ >> >> >> >> /etc/nagios3_bak: >> >> insgesamt 88K >> >> drwxr-xr-x 4 root root 146 23. Jun 13:54 . >> >> drwxr-xr-x 75 root root 4,0K 23. Jun 17:08 .. >> >> -rw-r--r-- 1 root root 1,9K 30. Jun 2009 apache2.conf >> >> -rw-r--r-- 1 root root 11K 23. Jun 13:49 cgi.cfg >> >> -rw-r--r-- 1 root root 2,4K 2. Jul 2009 commands.cfg >> >> drwxr-xr-x 2 root root 4,0K 7. Jun 19:16 conf.d >> >> -rw-r--r-- 1 root root 20 23. Jun 13:49 htpasswd.users >> >> -rw-r--r-- 1 root root 42K 2. Jul 2009 nagios.cfg >> >> -rw-r----- 1 root nagios 1,3K 30. Jun 2009 resource.cfg >> >> drwxr-xr-x 2 root root 4,0K 7. Jun 19:16 stylesheets >> >> >> >> /etc/nagios-plugins: >> >> insgesamt 12K >> >> drwxr-xr-x 3 root root 19 7. Jun 19:16 . >> >> drwxr-xr-x 75 root root 4,0K 23. Jun 17:08 .. >> >> drwxr-xr-x 2 root root 4,0K 7. Jun 19:16 config >> >> >> >> /var/lib/nagios3_bak: >> >> insgesamt 20K >> >> drwxr-x--- 4 nagios nagios 47 23. Jun 14:02 . >> >> drwxr-xr-x 33 root root 4,0K 23. Jun 14:04 .. >> >> -rw------- 1 nagios www-data 14K 23. Jun 14:02 retention.dat >> >> drwx------ 2 nagios www-data 6 2. Jul 2009 rw >> >> drwxr-x--- 3 nagios nagios 25 7. Jun 19:16 spool >> >> >> >> Here is my Config. >> >> >> >> ######################## >> >> ### 3. Cluster State ### >> >> ######################## >> >> >> >> ============ >> >> Last updated: Wed Jun 23 17:16:33 2010 >> >> Stack: openais >> >> Current DC: pilot01-node2 - partition with quorum >> >> Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75 >> >> 2 Nodes configured, 2 expected votes >> >> 4 Resources configured. >> >> ============ >> >> >> >> Node pilot01-node1: standby >> >> Online: [ pilot01-node2 ] >> >> >> >> Full list of resources: >> >> >> >> Resource Group: grp_MySQL >> >> res_Filesystem (ocf::heartbeat:Filesystem): Started >> pilot01-node2 >> >> res_ClusterIP (ocf::heartbeat:IPaddr2): Started >> pilot01-node2 >> >> res_MySQL (lsb:mysql): Started pilot01-node2 >> >> res_Apache (lsb:apache2): Started pilot01-node2 >> >> res_ClusterMonitor (ocf::pacemaker:ClusterMon): Started >> pilot01-node2 >> >> res_Nagios (lsb:nagios3): Stopped >> >> Master/Slave Set: ms_drbd_mysql0 >> >> Masters: [ pilot01-node2 ] >> >> Stopped: [ drbd_pilot0:0 ] >> >> Clone Set: cl-pinggw >> >> Started: [ pilot01-node2 ] >> >> Stopped: [ pinggw:0 ] >> >> Monitor-Cluster (ocf::pacemaker:ClusterMon): Started pilot01-node1 >> (unmanaged) FAILED >> >> >> >> Failed actions: >> >> Monitor-Cluster_stop_0 (node=pilot01-node1, call=34, rc=1, >> status=complete): unknown error >> >> res_Nagios_monitor_0 (node=pilot01-node1, call=84, rc=6, >> status=complete): not configured >> >> ######################### >> >> ### 4. Cluster Config ### >> >> ######################### >> >> >> >> node pilot01-node1 \ >> >> attributes standby="on" >> >> node pilot01-node2 \ >> >> attributes standby="off" >> >> primitive Monitor-Cluster ocf:pacemaker:ClusterMon \ >> >> params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \ >> >> params pidfile="/var/run/rlb-cluster-monitor.pid" \ >> >> op start interval="0" timeout="90s" \ >> >> op stop interval="0" timeout="100s" >> >> primitive drbd_pilot0 ocf:linbit:drbd \ >> >> params drbd_resource="pilot0" \ >> >> operations $id="drbd_pilot0-operations" \ >> >> op monitor interval="15s" >> >> primitive pinggw ocf:pacemaker:pingd \ >> >> params host_list="10.1.1.162" multiplier="200" \ >> >> op monitor interval="10s" >> >> primitive res_Apache lsb:apache2 \ >> >> operations $id="res_Apache-operations" \ >> >> op monitor interval="15s" timeout="20s" start-delay="15s" >> >> primitive res_ClusterIP ocf:heartbeat:IPaddr2 \ >> >> params iflabel="ClusterIP" ip="10.1.1.12" nic="eth0" >> cidr_netmask="24" \ >> >> operations $id="res_ClusterIP_1-operations" \ >> >> op monitor start-delay="0" interval="10s" >> >> primitive res_ClusterMonitor ocf:pacemaker:ClusterMon \ >> >> params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \ >> >> params pidfile="/var/run/rlb-cluster-monitor.pid" \ >> >> op start interval="0" timeout="90s" \ >> >> op stop interval="0" timeout="100s" \ >> >> meta target-role="Started" >> >> primitive res_Filesystem ocf:heartbeat:Filesystem \ >> >> params fstype="xfs" directory="/mnt/cluster" device="/dev/drbd0" >> options="noatime,nodiratime,barrier=0" >> >> primitive res_MySQL lsb:mysql >> >> primitive res_Nagios lsb:nagios3 \ >> >> operations $id="res_Nagios-operations" \ >> >> op monitor interval="15s" timeout="20s" \ >> >> meta target-role="Started" >> >> group grp_MySQL res_Filesystem res_ClusterIP res_MySQL res_Apache >> res_ClusterMonitor res_Nagios >> >> ms ms_drbd_mysql0 drbd_pilot0 \ >> >> meta master-max="1" master-node-max="1" clone-max="2" >> clone-node-max="1" notify="true" >> >> clone cl-pinggw pinggw \ >> >> meta globally-unique="false" >> >> location drbd-fence-by-handler-ms_drbd_mysql0 ms_drbd_mysql0 \ >> >> rule $id="drbd-fence-by-handler-rule-ms_drbd_mysql0" $role="Master" >> -inf: #uname ne pilot01-node2 >> >> location grp_MySQL-with-pinggw grp_MySQL \ >> >> rule $id="grp_MySQL-with-pinggw-rule-1" -inf: not_defined pingd or >> pingd lte 0 >> >> colocation col_drbd_on_mysql inf: grp_MySQL ms_drbd_mysql0:Master >> >> order mysql_after_drbd inf: ms_drbd_mysql0:promote grp_MySQL:start >> >> property $id="cib-bootstrap-options" \ >> >> expected-quorum-votes="2" \ >> >> stonith-enabled="false" \ >> >> no-quorum-policy="ignore" \ >> >> dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \ >> >> cluster-infrastructure="openais" \ >> >> last-lrm-refresh="1277306106" \ >> >> symmetric-cluster="true" \ >> >> migration-threshold="1" \ >> >> default-action-timeout="240s" >> >> >> >> Thanks for your help in advance. >> >> Sebastian >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker