Hi, On Wed, Mar 17, 2010 at 10:57:16AM +0100, Tom Tux wrote: > Hi Dominik > > The problem is, that the cluster does not do the monitor-action every > 20s. The last time, when he did the action was at 09:21. And now we > have 10:37:
There was a serious bug in some cluster-glue packages. What you're experiencing sounds like that. I can't say which packages (probably sth like 1.0.1, they were never released). At any rate, I'd suggest upgrading to cluster-glue 1.0.3. Thanks, Dejan > MySQL_MonitorAgent_Resource: migration-threshold=3 > + (479) stop: last-rc-change='Wed Mar 17 09:21:28 2010' > last-run='Wed Mar 17 09:21:28 2010' exec-time=3010ms queue-time=0ms > rc=0 (ok) > + (480) start: last-rc-change='Wed Mar 17 09:21:31 2010' > last-run='Wed Mar 17 09:21:31 2010' exec-time=3010ms queue-time=0ms > rc=0 (ok) > + (481) monitor: interval=10000ms last-rc-change='Wed Mar 17 > 09:21:34 2010' last-run='Wed Mar 17 09:21:34 2010' exec-time=20ms > queue-time=0ms rc=0 (ok) > > If I restart the whole cluster, then the new returncode (exit99 or > exit4) will be saw by the cluster-monitor. > > > 2010/3/17 Dominik Klein <d...@in-telegence.net>: > > Hi Tom > > > > have a look at the logs and see whether the monitor op really returns > > 99. (grep for the resource-id). If so, I'm not sure what the cluster > > does with rc=99. As far as I know, rc=4 would be status=failed (unknown > > actually). > > > > Regards > > Dominik > > > > Tom Tux wrote: > >> Thanks for your hint. > >> > >> I've configured an lsb-resource like this (with migration-threshold): > >> > >> primitive MySQL_MonitorAgent_Resource lsb:mysql-monitor-agent \ > >> meta target-role="Started" migration-threshold="3" \ > >> op monitor interval="10s" timeout="20s" on-fail="restart" > >> > >> I have now modified the init-script "/etc/init.d/mysql-monitor-agent", > >> to exit with a returncode not equal "0" (example exit 99), when the > >> monitor-operation is querying the status. But the cluster does not > >> recognise a failed monitor-action. Why this behaviour? For the > >> cluster, everything seems ok. > >> > >> node1:/ # showcores.sh MySQL_MonitorAgent_Resource > >> Resource Score Node Stickiness > >> #Fail Migration-Threshold > >> MySQL_MonitorAgent_Resource -1000000 node1 100 0 3 > >> MySQL_MonitorAgent_Resource 100 node2 100 0 3 > >> > >> I also saw, that the "last-run"-entry (crm_mon -fort1) for this > >> resource is not up-to-date. For me it seems, that the monitor-action > >> does not occurs every 10 seconds. Why? Any hints for this behaviour? > >> > >> Thanks a lot. > >> Tom > >> > >> > >> 2010/3/16 Dominik Klein <d...@in-telegence.net>: > >>> Tom Tux wrote: > >>>> Hi > >>>> > >>>> I've have a question about the resource-monitoring: > >>>> I'm monitoring an ip-resource every 20 seconds. I have configured the > >>>> "On Fail"-action with "restart". This works fine. If the > >>>> "monitor"-operation fails, then the resource will be restartet. > >>>> > >>>> But how can I define this resource, to migrate to the other node, if > >>>> the resource still fails after 10 restarts? Is this possible? How will > >>>> the "failcount" interact with this scenario? > >>>> > >>>> In the documentation I read, that the resource-"fail_count" will > >>>> encrease every time, when the resource restarts. But I can't see this > >>>> fail_count. > >>> Look at the meta attribute "migration-threshold". > >>> > >>> Regards > >>> Dominik > > > > > > _______________________________________________ > > Pacemaker mailing list > > Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker