Hi list,
Thanks for the input so far, here are new findings.
> > meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true" target-role="Master">
> > location location-groupMysql-on-node1 groupMysql inf: halab3
>
> So you have a "mandatory" location constraint saying
> run this thing only on halab3
>
You're right, that's not what I want.
> Remove the inf: halab3, or replace it with some not infinite score.
Ok, that's done! Now here is a "crm configure show" from another cluster on
which "crm resource move groupApache nodeha2" doesn't work (same configuration
as halab3):
node nodeha1
node nodeha2
primitive resApache ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf"
statusurl="http://localhost/server-status" \
op monitor interval="1min" \
op start interval="0" timeout="40" \
op stop interval="0" timeout="60"
primitive resDRBDApache ocf:linbit:drbd \
params drbd_resource="www-data" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100"
primitive resDRBDPostgresql ocf:linbit:drbd \
params drbd_resource="postgresql" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100"
primitive resFsApache ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/www-data" directory="/home/www-data"
fstype="ext4" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive resFsPostgresql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/postgresql"
directory="/var/lib/postgresql" fstype="ext4" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive resIPApache ocf:heartbeat:IPaddr2 \
params ip="178.209.1.10" nic="eth0" cidr_netmask="28" \
op monitor interval="30s"
primitive resIPPostgresql ocf:heartbeat:IPaddr2 \
params ip="178.209.1.11" nic="eth0" cidr_netmask="28" \
op monitor interval="30s"
primitive resPostgresql ocf:heartbeat:pgsql \
params pgctl="/usr/lib/postgresql/8.4/bin/pg_ctl"
psql="/usr/lib/postgresql/8.4/bin/psql" pgdata="/var/lib/postgresql/8.4/main"
pghost="178.209.1.11" config="/etc/postgresql/8.4/main/postgresql.conf"
logfile="/var/log/postgresql/postgresql-8.4-main.log" pgdb="template1"
monitor_user="monitor" monitor_password="123" \
op monitor interval="30" timeout="30" depth="0" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
group groupApache resFsApache resIPApache resApache
group groupPostgresql resFsPostgresql resIPPostgresql resPostgresql
ms msResDRBDApache resDRBDApache \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-
max="1" notify="true" target-role="Master"
ms msResDRBDPostgresql resDRBDPostgresql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-
max="1" notify="true" target-role="Master"
location location-groupApache-on-node1 groupApache 50: nodeha1
location location-groupPostgresql-on-node1 groupPostgresql 50: nodeha1
colocation colo-groupApache-msResDRBDApache inf: groupApache
msResDRBDApache:Master
colocation colo-groupPostgresql-msResDRBDPostgresql inf: groupPostgresql
msResDRBDPostgresql:Master
order orderGroupApache-after-msResDRBDApache inf: msResDRBDApache:promote
groupApache:start
order orderGroupPostgresql-after-msResDRBDPostgresql inf:
msResDRBDPostgresql:promote groupPostgresql:start
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1343987736"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
Before "crm resource move groupApache nodeha2":
./showscores.sh
Resource Score Node Stickiness #Fail
Migration-Threshold
resApache 100 clientisha1 100 0
resApache -INFINITY clientisha2 100 0
resDRBDApache:0 0 clientisha2 100 0
resDRBDApache:0 10100 clientisha1 100 0
resDRBDApache:0_(master) 10700 clientisha1 100 0
resDRBDApache:1 100 clientisha2 100 0
resDRBDApache:1 -INFINITY clientisha1 100 0
resDRBDApache:1_(master) -1 clientisha2 100 0
resDRBDPostgresql:0 0 clientisha2 100 0
resDRBDPostgresql:0 10100 clientisha1 100 0
resDRBDPostgresql:0_(master) 10700 clientisha1 100 0
resDRBDPostgresql:1 100 clientisha2 100 0
resDRBDPostgresql:1 -INFINITY clientisha1 100 0
resDRBDPostgresql:1_(master) -1 clientisha2 100 0
resFsApache 10450 clientisha1 100 0
resFsApache -INFINITY clientisha2 100 0
resFsPostgresql 10450 clientisha1 100 0
resFsPostgresql -INFINITY clientisha2 100 0
resIPApache 200 clientisha1 100 0
resIPApache -INFINITY clientisha2 100 0
resIPPostgresql 200 clientisha1 100 0
resIPPostgresql -INFINITY clientisha2 100 0
resPostgresql 100 clientisha1 100 0
resPostgresql -INFINITY clientisha2 100 0
After "crm resource move groupApache nodeha2":
The constraint is added:
location cli-prefer-groupApache groupApache \
rule $id="cli-prefer-rule-groupApache" inf: #uname eq nodeha2
./showscores.sh
Resource Score Node Stickiness #Fail
Migration-Threshold
resApache 100 clientisha1 100 0
resApache -INFINITY clientisha2 100 0
resDRBDApache:0 0 clientisha2 100 0
resDRBDApache:0 10100 clientisha1 100 0
resDRBDApache:0_(master) 10700 clientisha1 100 0
resDRBDApache:1 100 clientisha2 100 0
resDRBDApache:1 -INFINITY clientisha1 100 0
resDRBDApache:1_(master) -1 clientisha2 100 0
resDRBDPostgresql:0 0 clientisha2 100 0
resDRBDPostgresql:0 10100 clientisha1 100 0
resDRBDPostgresql:0_(master) 10700 clientisha1 100 0
resDRBDPostgresql:1 100 clientisha2 100 0
resDRBDPostgresql:1 -INFINITY clientisha1 100 0
resDRBDPostgresql:1_(master) -1 clientisha2 100 0
resFsApache 10450 clientisha1 100 0
resFsApache -INFINITY clientisha2 100 0
resFsPostgresql 10450 clientisha1 100 0
resFsPostgresql -INFINITY clientisha2 100 0
resIPApache 200 clientisha1 100 0
resIPApache -INFINITY clientisha2 100 0
resIPPostgresql 200 clientisha1 100 0
resIPPostgresql -INFINITY clientisha2 100 0
resPostgresql 100 clientisha1 100 0
resPostgresql -INFINITY clientisha2 100 0
The scores don't look like they are changing.
The log looks like that:
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation
complete: op cib_delete for section constraints
(origin=nodeha1/crm_resource/3, version=0.69.2): ok (rc=0)
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: - <cib admin_epoch="0"
epoch="69" num_updates="2" />
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <cib epoch="70"
num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
crm_feature_set="3.0.6" update-origin="nodeha1" update-client="crm_resource"
cib-last-written="Fri Aug 3 16:31:40 2012" have-quorum="1" dc-uuid="nodeha2"
>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <configuration >
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <constraints >
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <rsc_location
id="cli-prefer-groupApache" rsc="groupApache" __crm_diff_marker__="added:top"
>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <rule id="cli-
prefer-rule-groupApache" score="INFINITY" boolean-op="and" >
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <expression
id="cli-prefer-expr-groupApache" attribute="#uname" operation="eq"
value="nodeha2" type="string" />
Aug 03 16:33:11 nodeha2 crmd: [4173]: info: abort_transition_graph:
te_update_diff:126 - Triggered transition abort (complete=1, tag=diff,
id=(null), magic=NA, cib=0.70.1) : Non-status change
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </rule>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </rsc_location>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </constraints>
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </configuration>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </cib>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation
complete: op cib_modify for section constraints
(origin=nodeha1/crm_resource/4, version=0.70.1): ok (rc=0)
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation
monitor found resource resDRBDPostgresql:0 active in master mode on nodeha1
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation
monitor found resource resDRBDApache:0 active in master mode on nodeha1
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Aug 03 16:33:11 nodeha2 crmd: [4173]: info: do_te_invoke: Processing graph 332
(ref=pe_calc-dc-1344004391-579) derived from /var/lib/pengine/pe-input-87.bz2
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: run_graph: ==== Transition 332
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-87.bz2): Complete
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: process_pe_message:
Transition 332: PEngine Input stored in: /var/lib/pengine/pe-input-87.bz2
Maybe I need to clear some counters or score caches?
> > How can I debug such problems?
>
> Experience helps ;-)
That's really true. And I'm actually in the process of gaining experience =)
Cheers,
Tobias
--
Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
Support +41 44 637 40 40 | Tel +41 44 637 40 00 | Direct +41 44 637 40 13
Skype nine.ch_support
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems