Hi,

and first of all thanks for answering so far.


Am 12.08.2010 18:46, schrieb Dejan Muhamedagic:

The migration-threshold shouldn't in any way influence resources
which don't depend on the resource which fails over. Couldn't
reproduce it here with our example RAs.
Well, I now - just to clearly assure that something's wrong there; whatever it is, simple misconfiguration or possible bug - did crm configure erase, completely restarted both nodes, and then setup this new, very simple, dummy-based configuration: v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v
node alpha \
        attributes standby="off"
node beta \
        attributes standby="off"
primitive dlm ocf:heartbeat:Dummy
primitive drbd ocf:heartbeat:Dummy
primitive mount ocf:heartbeat:Dummy
primitive mysql ocf:heartbeat:Dummy \
        meta migration-threshold="3" failure-timeout="40"
primitive o2cb ocf:heartbeat:Dummy
location cli-prefer-mount mount \
        rule $id="cli-prefer-rule-mount" inf: #uname eq alpha
colocation colocMysql inf: mysql mount
order orderMysql inf: mount mysql
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-unknown" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        cluster-recheck-interval="150" \
        last-lrm-refresh="1281751924"
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
...and then, with picking on the resource "mysql", got this:

1) alpha: FC(mysql)=0, crm_resource -F -r mysql -H alpha
Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_asyncmon_0 (call=48, rc=1, cib-update=563, confirmed=false) unknown error Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_stop_0 (call=49, rc=0, cib-update=565, confirmed=true) ok Aug 14 04:15:30 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_start_0 (call=50, rc=0, cib-update=567, confirmed=true) ok

2) alpha: FC(mysql)=1, crm_resource -F -r mysql -H alpha
Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_asyncmon_0 (call=51, rc=1, cib-update=568, confirmed=false) unknown error Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_stop_0 (call=52, rc=0, cib-update=572, confirmed=true) ok Aug 14 04:15:42 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_start_0 (call=53, rc=0, cib-update=573, confirmed=true) ok

3) alpha: FC(mysql)=2, crm_resource -F -r mysql -H alpha
Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_asyncmon_0 (call=54, rc=1, cib-update=574, confirmed=false) unknown error Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_stop_0 (call=55, rc=0, cib-update=576, confirmed=true) ok Aug 14 04:15:56 alpha crmd: [900]: info: process_lrm_event: LRM operation mount_stop_0 (call=56, rc=0, cib-update=578, confirmed=true) ok
beta: (FC(mysql)=3
Aug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM operation mount_start_0 (call=36, rc=0, cib-update=92, confirmed=true) ok Aug 14 04:15:56 beta crmd: [868]: info: process_lrm_event: LRM operation mysql_start_0 (call=37, rc=0, cib-update=93, confirmed=true) ok Aug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM operation mysql_stop_0 (call=38, rc=0, cib-update=94, confirmed=true) ok Aug 14 04:18:26 beta crmd: [868]: info: process_lrm_event: LRM operation mount_stop_0 (call=39, rc=0, cib-update=95, confirmed=true) ok
alpha: FC(mysql)=3
Aug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRM operation mount_start_0 (call=57, rc=0, cib-update=580, confirmed=true) ok Aug 14 04:18:26 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_start_0 (call=58, rc=0, cib-update=581, confirmed=true) ok


So it seems that - for what reason ever - those constrainted resources are considered and treated just as they were in a resource-group, because they move to where they all can run, instead of the "eat or die" for the dependent resource (mysql) to the underlying resource (mount) that I had expected with such constraints as I set them... shouldn't I?! o_O


And - concerning the failure-timeout - quite a while later, without having resetted mysql's failure counter or having done anything else in the meantime:

4) alpha: FC(mysql)=3, crm_resource -F -r mysql -H alpha
Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_asyncmon_0 (call=59, rc=1, cib-update=592, confirmed=false) unknown error Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_stop_0 (call=60, rc=0, cib-update=596, confirmed=true) ok Aug 14 04:44:47 alpha crmd: [900]: info: process_lrm_event: LRM operation mount_stop_0 (call=61, rc=0, cib-update=597, confirmed=true) ok
beta: FC(mysql)=0
Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM operation mount_start_0 (call=40, rc=0, cib-update=96, confirmed=true) ok Aug 14 04:44:47 beta crmd: [868]: info: process_lrm_event: LRM operation mysql_start_0 (call=41, rc=0, cib-update=97, confirmed=true) ok Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM operation mysql_stop_0 (call=42, rc=0, cib-update=98, confirmed=true) ok Aug 14 04:47:17 beta crmd: [868]: info: process_lrm_event: LRM operation mount_stop_0 (call=43, rc=0, cib-update=99, confirmed=true) ok
alpha: FC(mysql)=4
Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM operation mount_start_0 (call=62, rc=0, cib-update=599, confirmed=true) ok Aug 14 04:47:17 alpha crmd: [900]: info: process_lrm_event: LRM operation mysql_start_0 (call=63, rc=0, cib-update=600, confirmed=true) ok

BTW, what's the point of cloneMountMysql? If it can run only
where drbd is master, then it can run on one node only:

colocation colocMountMysql_drbd inf: cloneMountMysql msDrbdMysql:Master
order orderMountMysql_drbd inf: msDrbdMysql:promote cloneMountMysql:start
It's a dual-primary-DRBD-configuration, so there are actually - when everything is ok (-; - 2 masters of each DRBD-multistate-resource... even though I admit that at least the dual primary respectively master for msDrbdMysql is currently (quite) redundant, since in the current cluster configuration there's only one, primitive MySQL-resource and thus there'd be no inevitable need for MySQL's data-dir being mounted all time on both nodes. But since it's not harmful to have it mounted on the other node too, and since msDrbdOpencms and msDrbdShared need to be mounted on both nodes and since I put the complete installation and configuration of the cluster into flexibly configurable shell-scripts, it's easier respectively done with less typing to just put all DRBD- and mount-resources' configuration into just one common loop. (-;

d) I also have the impression that fail-counters don't get reset
after their failure-timeout, because when migration-threshold=3 is
set, upon every(!) following picking-on those issues occure, even
when I've waited for nearly 5 minutes (with failure-timeout=90)
without any touching the cluster
That seems to be a bug though I couldn't reproduce it with a
simple configuration.
I just also tested this once again: It seems like that failure-timeout only sets back scores from -inf to around 0 (whereever they should normally be), allowing the resources to return back to the node. I tested with setting a location constraint for the underlying resource (see configuration): After the failure-timeout has been completed, on the next cluster-recheck (and only then!) the underlying resource and its dependants return to the underlying resource's prefered location, as you see in logs above.










_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to