Re: [Pacemaker] migration-threshold causing unnecessary restart of underlying resources

Cnut Jansen Fri, 13 Aug 2010 21:30:59 -0700

 Hi,

and first of all thanks for answering so far.



Am 12.08.2010 18:46, schrieb Dejan Muhamedagic:


The migration-threshold shouldn't in any way influence resources
which don't depend on the resource which fails over. Couldn't
reproduce it here with our example RAs.

Well, I now - just to clearly assure that something's wrong there;whatever it is, simple misconfiguration or possible bug - did crmconfigure erase, completely restarted both nodes, and then setup thisnew, very simple, dummy-based configuration:v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v vv v v v

node alpha \
        attributes standby="off"
node beta \
        attributes standby="off"
primitive dlm ocf:heartbeat:Dummy
primitive drbd ocf:heartbeat:Dummy
primitive mount ocf:heartbeat:Dummy
primitive mysql ocf:heartbeat:Dummy \
        meta migration-threshold="3" failure-timeout="40"
primitive o2cb ocf:heartbeat:Dummy
location cli-prefer-mount mount \
        rule $id="cli-prefer-rule-mount" inf: #uname eq alpha
colocation colocMysql inf: mysql mount
order orderMysql inf: mount mysql
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-unknown" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        cluster-recheck-interval="150" \
        last-lrm-refresh="1281751924"

^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^ ^ ^ ^

...and then, with picking on the resource "mysql", got this:

1) alpha: FC(mysql)=0, crm_resource -F -r mysql -H alpha

Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_asyncmon_0 (call=48, rc=1, cib-update=563,confirmed=false) unknown errorAug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_stop_0 (call=49, rc=0, cib-update=565, confirmed=true) okAug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_start_0 (call=50, rc=0, cib-update=567, confirmed=true) ok


2) alpha: FC(mysql)=1, crm_resource -F -r mysql -H alpha

Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_asyncmon_0 (call=51, rc=1, cib-update=568,confirmed=false) unknown errorAug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_stop_0 (call=52, rc=0, cib-update=572, confirmed=true) okAug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_start_0 (call=53, rc=0, cib-update=573, confirmed=true) ok


3) alpha: FC(mysql)=2, crm_resource -F -r mysql -H alpha

Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_asyncmon_0 (call=54, rc=1, cib-update=574,confirmed=false) unknown errorAug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_stop_0 (call=55, rc=0, cib-update=576, confirmed=true) okAug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRMoperation mount_stop_0 (call=56, rc=0, cib-update=578, confirmed=true) ok

beta: (FC(mysql)=3

Aug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM operationmount_start_0 (call=36, rc=0, cib-update=92, confirmed=true) okAug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM operationmysql_start_0 (call=37, rc=0, cib-update=93, confirmed=true) okAug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM operationmysql_stop_0 (call=38, rc=0, cib-update=94, confirmed=true) okAug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM operationmount_stop_0 (call=39, rc=0, cib-update=95, confirmed=true) ok

alpha: FC(mysql)=3

Aug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRMoperation mount_start_0 (call=57, rc=0, cib-update=580, confirmed=true) okAug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_start_0 (call=58, rc=0, cib-update=581, confirmed=true) ok

So it seems that - for what reason ever - those constrainted resourcesare considered and treated just as they were in a resource-group,because they move to where they all can run, instead of the "eat or die"for the dependent resource (mysql) to the underlying resource (mount)that I had expected with such constraints as I set them... shouldn't I?! o_O

And - concerning the failure-timeout - quite a while later, withouthaving resetted mysql's failure counter or having done anything else inthe meantime:


4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha

Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_asyncmon_0 (call=59, rc=1, cib-update=592,confirmed=false) unknown errorAug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_stop_0 (call=60, rc=0, cib-update=596, confirmed=true) okAug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRMoperation mount_stop_0 (call=61, rc=0, cib-update=597, confirmed=true) ok

beta: FC(mysql)=0

Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM operationmount_start_0 (call=40, rc=0, cib-update=96, confirmed=true) okAug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM operationmysql_start_0 (call=41, rc=0, cib-update=97, confirmed=true) okAug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM operationmysql_stop_0 (call=42, rc=0, cib-update=98, confirmed=true) okAug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM operationmount_stop_0 (call=43, rc=0, cib-update=99, confirmed=true) ok

alpha: FC(mysql)=4

Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRMoperation mount_start_0 (call=62, rc=0, cib-update=599, confirmed=true) okAug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRMoperation mysql_start_0 (call=63, rc=0, cib-update=600, confirmed=true) ok

BTW, what's the point of cloneMountMysql? If it can run only
where drbd is master, then it can run on one node only:

colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start

It's a dual-primary-DRBD-configuration, so there are actually - wheneverything is ok (-; - 2 masters of each DRBD-multistate-resource...even though I admit that at least the dual primary respectively masterfor msDrbdMysql is currently (quite) redundant, since in the currentcluster configuration there's only one, primitive MySQL-resource andthus there'd be no inevitable need for MySQL's data-dir being mountedall time on both nodes.But since it's not harmful to have it mounted on the other node too, andsince msDrbdOpencms and msDrbdShared need to be mounted on both nodesand since I put the complete installation and configuration of thecluster into flexibly configurable shell-scripts, it's easierrespectively done with less typing to just put all DRBD- andmount-resources' configuration into just one common loop. (-;

d) I also have the impression that fail-counters don't get reset
after their failure-timeout, because when migration-threshold=3 is
set, upon every(!) following picking-on those issues occure, even
when I've waited for nearly 5 minutes (with failure-timeout=90)
without any touching the cluster

That seems to be a bug though I couldn't reproduce it with a
simple configuration.

I just also tested this once again: It seems like that failure-timeoutonly sets back scores from -inf to around 0 (whereever they shouldnormally be), allowing the resources to return back to the node. Itested with setting a location constraint for the underlying resource(see configuration): After the failure-timeout has been completed, onthe next cluster-recheck (and only then!) the underlying resource andits dependants return to the underlying resource's prefered location, asyou see in logs above.











_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] migration-threshold causing unnecessary restart of underlying resources

Reply via email to