Hi,

On Fri, Aug 03, 2012 at 04:37:55PM +0200, Tobias Brunner wrote:
> Hi list,
> 
> Thanks for the input so far, here are new findings.
> 
> > >         meta master-max="1" master-node-max="1" clone-max="2"
> > >         clone-node-max="1" notify="true" target-role="Master"> 
> > > location location-groupMysql-on-node1 groupMysql inf: halab3
> > 
> > So you have a "mandatory" location constraint saying
> >     run this thing only on halab3
> > 
> 
> You're right, that's not what I want.
> 
> > Remove the inf: halab3, or replace it with some not infinite score.
> 
> Ok, that's done! Now here is a "crm configure show" from another cluster on 
> which "crm resource move groupApache nodeha2" doesn't work (same 
> configuration 
> as halab3):
> 
> node nodeha1
> node nodeha2
> primitive resApache ocf:heartbeat:apache \
>         params configfile="/etc/apache2/apache2.conf" 
> statusurl="http://localhost/server-status"; \
>         op monitor interval="1min" \
>         op start interval="0" timeout="40" \
>         op stop interval="0" timeout="60"
> primitive resDRBDApache ocf:linbit:drbd \
>         params drbd_resource="www-data" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="100"
> primitive resDRBDPostgresql ocf:linbit:drbd \
>         params drbd_resource="postgresql" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="100"
> primitive resFsApache ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/www-data" directory="/home/www-data" 
> fstype="ext4" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="60"
> primitive resFsPostgresql ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/postgresql" 
> directory="/var/lib/postgresql" fstype="ext4" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="60"
> primitive resIPApache ocf:heartbeat:IPaddr2 \
>         params ip="178.209.1.10" nic="eth0" cidr_netmask="28" \
>         op monitor interval="30s"
> primitive resIPPostgresql ocf:heartbeat:IPaddr2 \
>         params ip="178.209.1.11" nic="eth0" cidr_netmask="28" \
>         op monitor interval="30s"
> primitive resPostgresql ocf:heartbeat:pgsql \
>         params pgctl="/usr/lib/postgresql/8.4/bin/pg_ctl" 
> psql="/usr/lib/postgresql/8.4/bin/psql" pgdata="/var/lib/postgresql/8.4/main" 
> pghost="178.209.1.11" config="/etc/postgresql/8.4/main/postgresql.conf" 
> logfile="/var/log/postgresql/postgresql-8.4-main.log" pgdb="template1" 
> monitor_user="monitor" monitor_password="123" \
>         op monitor interval="30" timeout="30" depth="0" \
>         op start interval="0" timeout="120" \
>         op stop interval="0" timeout="120"
> group groupApache resFsApache resIPApache resApache
> group groupPostgresql resFsPostgresql resIPPostgresql resPostgresql
> ms msResDRBDApache resDRBDApache \
>         meta master-max="1" master-node-max="1" clone-max="2" clone-node-
> max="1" notify="true" target-role="Master"
> ms msResDRBDPostgresql resDRBDPostgresql \
>         meta master-max="1" master-node-max="1" clone-max="2" clone-node-
> max="1" notify="true" target-role="Master"
> location location-groupApache-on-node1 groupApache 50: nodeha1
> location location-groupPostgresql-on-node1 groupPostgresql 50: nodeha1
> colocation colo-groupApache-msResDRBDApache inf: groupApache 
> msResDRBDApache:Master
> colocation colo-groupPostgresql-msResDRBDPostgresql inf: groupPostgresql 
> msResDRBDPostgresql:Master
> order orderGroupApache-after-msResDRBDApache inf: msResDRBDApache:promote 
> groupApache:start
> order orderGroupPostgresql-after-msResDRBDPostgresql inf: 
> msResDRBDPostgresql:promote groupPostgresql:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="false" \
>         last-lrm-refresh="1343987736"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="100"
> 
> 
> Before "crm resource move groupApache nodeha2":
> ./showscores.sh                                                               
>                                                                               
>                                                             
> Resource                     Score     Node        Stickiness #Fail    
> Migration-Threshold
> resApache                    100       clientisha1 100        0               
>          
> resApache                    -INFINITY clientisha2 100        0               
>          
> resDRBDApache:0              0         clientisha2 100        0               
>          
> resDRBDApache:0              10100     clientisha1 100        0               
>          
> resDRBDApache:0_(master)     10700     clientisha1 100        0               
>          
> resDRBDApache:1              100       clientisha2 100        0               
>          
> resDRBDApache:1              -INFINITY clientisha1 100        0               
>          
> resDRBDApache:1_(master)     -1        clientisha2 100        0               
>          
> resDRBDPostgresql:0          0         clientisha2 100        0               
>          
> resDRBDPostgresql:0          10100     clientisha1 100        0               
>          
> resDRBDPostgresql:0_(master) 10700     clientisha1 100        0               
>          
> resDRBDPostgresql:1          100       clientisha2 100        0               
>          
> resDRBDPostgresql:1          -INFINITY clientisha1 100        0               
>          
> resDRBDPostgresql:1_(master) -1        clientisha2 100        0               
>          
> resFsApache                  10450     clientisha1 100        0               
>          
> resFsApache                  -INFINITY clientisha2 100        0               
>          
> resFsPostgresql              10450     clientisha1 100        0               
>          
> resFsPostgresql              -INFINITY clientisha2 100        0               
>          
> resIPApache                  200       clientisha1 100        0               
>          
> resIPApache                  -INFINITY clientisha2 100        0               
>          
> resIPPostgresql              200       clientisha1 100        0               
>          
> resIPPostgresql              -INFINITY clientisha2 100        0               
>          
> resPostgresql                100       clientisha1 100        0               
>          
> resPostgresql                -INFINITY clientisha2 100        0

abs(-inf) > inf

I guess that you need to do some resource cleanup to remove
record of old failures.

It's interesting that you have three sets of node names (one
from the config, another from showscores, and third from you).
Whoever got confused.

Thanks,

Dejan

> After "crm resource move groupApache nodeha2":
> 
> The constraint is added:
> location cli-prefer-groupApache groupApache \
>         rule $id="cli-prefer-rule-groupApache" inf: #uname eq nodeha2
> 
> ./showscores.sh 
> Resource                     Score     Node        Stickiness #Fail    
> Migration-Threshold
> resApache                    100       clientisha1 100        0               
>          
> resApache                    -INFINITY clientisha2 100        0               
>          
> resDRBDApache:0              0         clientisha2 100        0               
>          
> resDRBDApache:0              10100     clientisha1 100        0               
>          
> resDRBDApache:0_(master)     10700     clientisha1 100        0               
>          
> resDRBDApache:1              100       clientisha2 100        0               
>          
> resDRBDApache:1              -INFINITY clientisha1 100        0               
>          
> resDRBDApache:1_(master)     -1        clientisha2 100        0               
>          
> resDRBDPostgresql:0          0         clientisha2 100        0               
>          
> resDRBDPostgresql:0          10100     clientisha1 100        0               
>          
> resDRBDPostgresql:0_(master) 10700     clientisha1 100        0               
>          
> resDRBDPostgresql:1          100       clientisha2 100        0               
>          
> resDRBDPostgresql:1          -INFINITY clientisha1 100        0               
>          
> resDRBDPostgresql:1_(master) -1        clientisha2 100        0               
>          
> resFsApache                  10450     clientisha1 100        0               
>          
> resFsApache                  -INFINITY clientisha2 100        0               
>          
> resFsPostgresql              10450     clientisha1 100        0               
>          
> resFsPostgresql              -INFINITY clientisha2 100        0               
>          
> resIPApache                  200       clientisha1 100        0               
>          
> resIPApache                  -INFINITY clientisha2 100        0               
>          
> resIPPostgresql              200       clientisha1 100        0               
>          
> resIPPostgresql              -INFINITY clientisha2 100        0               
>          
> resPostgresql                100       clientisha1 100        0               
>          
> resPostgresql                -INFINITY clientisha2 100        0
> 
> The scores don't look like they are changing.
> 
> The log looks like that:
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation 
> complete: op cib_delete for section constraints 
> (origin=nodeha1/crm_resource/3, version=0.69.2): ok (rc=0)
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: - <cib admin_epoch="0" 
> epoch="69" num_updates="2" />
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <cib epoch="70" 
> num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" 
> crm_feature_set="3.0.6" update-origin="nodeha1" update-client="crm_resource" 
> cib-last-written="Fri Aug  3 16:31:40 2012" have-quorum="1" dc-uuid="nodeha2" 
> >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +   <configuration >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +     <constraints >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +       <rsc_location 
> id="cli-prefer-groupApache" rsc="groupApache" __crm_diff_marker__="added:top" 
> >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +         <rule id="cli-
> prefer-rule-groupApache" score="INFINITY" boolean-op="and" >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +           <expression 
> id="cli-prefer-expr-groupApache" attribute="#uname" operation="eq" 
> value="nodeha2" type="string" />
> Aug 03 16:33:11 nodeha2 crmd: [4173]: info: abort_transition_graph: 
> te_update_diff:126 - Triggered transition abort (complete=1, tag=diff, 
> id=(null), magic=NA, cib=0.70.1) : Non-status change
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +         </rule>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +       </rsc_location>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +     </constraints>
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: +   </configuration>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </cib>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation 
> complete: op cib_modify for section constraints 
> (origin=nodeha1/crm_resource/4, version=0.70.1): ok (rc=0)
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_config: On loss of 
> CCM 
> Quorum: Ignore
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation 
> monitor found resource resDRBDPostgresql:0 active in master mode on nodeha1
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation 
> monitor found resource resDRBDApache:0 active in master mode on nodeha1
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State 
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=handle_response ]
> Aug 03 16:33:11 nodeha2 crmd: [4173]: info: do_te_invoke: Processing graph 
> 332 
> (ref=pe_calc-dc-1344004391-579) derived from /var/lib/pengine/pe-input-87.bz2
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: run_graph: ==== Transition 332 
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pengine/pe-input-87.bz2): Complete
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: process_pe_message: 
> Transition 332: PEngine Input stored in: /var/lib/pengine/pe-input-87.bz2
> 
> Maybe I need to clear some counters or score caches?
> 
> > > How can I debug such problems?
> > 
> > Experience helps ;-)
> 
> That's really true. And I'm actually in the process of gaining experience =)
> 
> Cheers,
> Tobias
> 
> -- 
> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> Support +41 44 637 40 40 | Tel +41 44 637 40 00 | Direct +41 44 637 40 13
> Skype nine.ch_support
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to