Hi Dejan hb_report -V says: cluster-glue: 1.0.2 (b75bd738dc09263a578accc69342de2cb2eb8db6)
I've opened a case by Novell. They will fix this problem with updating to the newest cluster-glue-release. Could it be, that I have another configuration-issue in my cluster-config? I think with the following setting, the resource should be monitored: ... ... primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent \ meta migration-threshold="3" \ op monitor interval="10s" timeout="20s" on-fail="restart" op_defaults $id="op_defaults-options" \ on-fail="restart" \ enabled="true" property $id="cib-bootstrap-options" \ expected-quorum-votes="2" \ dc-version="1.0.6-c48e3360eb18c53fd68bb7e7dbe39279ccbc0354" \ cluster-infrastructure="openais" \ stonith-enabled="true" \ no-quorum-policy="ignore" \ stonith-action="reboot" \ last-lrm-refresh="1268838090" ... ... And when I look the last-run-time with "crm_mon -fort1", then it results me: MySQL_Server_Resource: migration-threshold=3 + (32) stop: last-rc-change='Wed Mar 17 10:49:55 2010' last-run='Wed Mar 17 10:49:55 2010' exec-time=5060ms queue-time=0ms rc=0 (ok) + (40) start: last-rc-change='Wed Mar 17 11:09:06 2010' last-run='Wed Mar 17 11:09:06 2010' exec-time=4080ms queue-time=0ms rc=0 (ok) + (41) monitor: interval=20000ms last-rc-change='Wed Mar 17 11:09:10 2010' last-run='Wed Mar 17 11:09:10 2010' exec-time=20ms queue-time=0ms rc=0 (ok) And the results above was yesterday.... Thanks for your help. Kind regards, Tom 2010/3/18 Dejan Muhamedagic <deja...@fastmail.fm>: > Hi, > > On Wed, Mar 17, 2010 at 12:38:47PM +0100, Tom Tux wrote: >> Hi Dejan >> >> Thanks for your answer. >> >> I'm using this cluster with the packages from the HAE >> (HighAvailability-Extension)-Repository from SLES11. Therefore, is it >> possible, to upgrade the cluster-glue from source? > > Yes, though I don't think that any SLE11 version has this bug. > When was your version released? What does hb_report -V say? > >> I think, the better >> way is to wait for updates in the hae-repository from novell. Or do >> you have experience, upgrading the cluster-glue from source (even if >> it is installed with zypper/rpm)? >> >> Do you know, when the HAE-Repository will be upgraded? > > Can't say. Best would be if you talk to Novell about the issue. > > Cheers, > > Dejan > >> Thanks a lot. >> Tom >> >> >> 2010/3/17 Dejan Muhamedagic <deja...@fastmail.fm>: >> > Hi, >> > >> > On Wed, Mar 17, 2010 at 10:57:16AM +0100, Tom Tux wrote: >> >> Hi Dominik >> >> >> >> The problem is, that the cluster does not do the monitor-action every >> >> 20s. The last time, when he did the action was at 09:21. And now we >> >> have 10:37: >> > >> > There was a serious bug in some cluster-glue packages. What >> > you're experiencing sounds like that. I can't say which >> > packages (probably sth like 1.0.1, they were never released). At >> > any rate, I'd suggest upgrading to cluster-glue 1.0.3. >> > >> > Thanks, >> > >> > Dejan >> > >> >> MySQL_MonitorAgent_Resource: migration-threshold=3 >> >> + (479) stop: last-rc-change='Wed Mar 17 09:21:28 2010' >> >> last-run='Wed Mar 17 09:21:28 2010' exec-time=3010ms queue-time=0ms >> >> rc=0 (ok) >> >> + (480) start: last-rc-change='Wed Mar 17 09:21:31 2010' >> >> last-run='Wed Mar 17 09:21:31 2010' exec-time=3010ms queue-time=0ms >> >> rc=0 (ok) >> >> + (481) monitor: interval=10000ms last-rc-change='Wed Mar 17 >> >> 09:21:34 2010' last-run='Wed Mar 17 09:21:34 2010' exec-time=20ms >> >> queue-time=0ms rc=0 (ok) >> >> >> >> If I restart the whole cluster, then the new returncode (exit99 or >> >> exit4) will be saw by the cluster-monitor. >> >> >> >> >> >> 2010/3/17 Dominik Klein <d...@in-telegence.net>: >> >> > Hi Tom >> >> > >> >> > have a look at the logs and see whether the monitor op really returns >> >> > 99. (grep for the resource-id). If so, I'm not sure what the cluster >> >> > does with rc=99. As far as I know, rc=4 would be status=failed (unknown >> >> > actually). >> >> > >> >> > Regards >> >> > Dominik >> >> > >> >> > Tom Tux wrote: >> >> >> Thanks for your hint. >> >> >> >> >> >> I've configured an lsb-resource like this (with migration-threshold): >> >> >> >> >> >> primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent \ >> >> >> meta target-role="Started" migration-threshold="3" \ >> >> >> op monitor interval="10s" timeout="20s" on-fail="restart" >> >> >> >> >> >> I have now modified the init-script "/etc/init.d/mysql-monitor-agent", >> >> >> to exit with a returncode not equal "0" (example exit 99), when the >> >> >> monitor-operation is querying the status. But the cluster does not >> >> >> recognise a failed monitor-action. Why this behaviour? For the >> >> >> cluster, everything seems ok. >> >> >> >> >> >> node1:/ # showcores.sh MySQL_MonitorAgent_Resource >> >> >> Resource Score Node Stickiness >> >> >> #Fail Migration-Threshold >> >> >> MySQL_MonitorAgent_Resource -1000000 node1 100 0 >> >> >> 3 >> >> >> MySQL_MonitorAgent_Resource 100 node2 100 0 >> >> >> 3 >> >> >> >> >> >> I also saw, that the "last-run"-entry (crm_mon -fort1) for this >> >> >> resource is not up-to-date. For me it seems, that the monitor-action >> >> >> does not occurs every 10 seconds. Why? Any hints for this behaviour? >> >> >> >> >> >> Thanks a lot. >> >> >> Tom >> >> >> >> >> >> >> >> >> 2010/3/16 Dominik Klein <d...@in-telegence.net>: >> >> >>> Tom Tux wrote: >> >> >>>> Hi >> >> >>>> >> >> >>>> I've have a question about the resource-monitoring: >> >> >>>> I'm monitoring an ip-resource every 20 seconds. I have configured the >> >> >>>> "On Fail"-action with "restart". This works fine. If the >> >> >>>> "monitor"-operation fails, then the resource will be restartet. >> >> >>>> >> >> >>>> But how can I define this resource, to migrate to the other node, if >> >> >>>> the resource still fails after 10 restarts? Is this possible? How >> >> >>>> will >> >> >>>> the "failcount" interact with this scenario? >> >> >>>> >> >> >>>> In the documentation I read, that the resource-"fail_count" will >> >> >>>> encrease every time, when the resource restarts. But I can't see this >> >> >>>> fail_count. >> >> >>> Look at the meta attribute "migration-threshold". >> >> >>> >> >> >>> Regards >> >> >>> Dominik >> >> > >> >> > >> >> > _______________________________________________ >> >> > Pacemaker mailing list >> >> > Pacemaker@oss.clusterlabs.org >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > >> >> >> >> _______________________________________________ >> >> Pacemaker mailing list >> >> Pacemaker@oss.clusterlabs.org >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > _______________________________________________ >> > Pacemaker mailing list >> > Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> >> _______________________________________________ >> Pacemaker mailing list >> Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker