Re: [Pacemaker] Unique clone instance is stopped too early on move

Andrew Beekhof Sun, 12 Apr 2015 23:19:05 -0700

> On 22 Jan 2015, at 12:04 am, Vladislav Bogdanov <bub...@hoster-ok.com> wrote:
> 
> 20.01.2015 02:44, Andrew Beekhof wrote:
>> 
>>> On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov <bub...@hoster-ok.com> wrote:
>>> 
>>> 16.01.2015 07:44, Andrew Beekhof wrote:
>>>> 
>>>>> On 15 Jan 2015, at 3:11 pm, Vladislav Bogdanov <bub...@hoster-ok.com> 
>>>>> wrote:
>>>>> 
>>>>> 13.01.2015 11:32, Andrei Borzenkov wrote:
>>>>>> On Tue, Jan 13, 2015 at 10:20 AM, Vladislav Bogdanov
>>>>>> <bub...@hoster-ok.com> wrote:
>>>>>>> Hi Andrew, David, all.
>>>>>>> 
>>>>>>> I found a little bit strange operation ordering during transition 
>>>>>>> execution.
>>>>>>> 
>>>>>>> Could you please look at the following partial configuration (crmsh 
>>>>>>> syntax)?
>>>>>>> 
>>>>>>> ===
>>>>>>> ...
>>>>>>> clone cl-broker broker \
>>>>>>>         meta interleave=true target-role=Started
>>>>>>> clone cl-broker-vips broker-vips \
>>>>>>>         meta clone-node-max=2 globally-unique=true interleave=true 
>>>>>>> resource-stickiness=0 target-role=Started
>>>>>>> clone cl-ctdb ctdb \
>>>>>>>         meta interleave=true target-role=Started
>>>>>>> colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
>>>>>>> colocation broker-with-ctdb inf: cl-broker cl-ctdb
>>>>>>> order broker-after-ctdb inf: cl-ctdb cl-broker
>>>>>>> order broker-vips-after-broker 0: cl-broker cl-broker-vips
>>>>>>> ...
>>>>>>> ===
>>>>>>> 
>>>>>>> After I put one node to standby and then back to online, I see the 
>>>>>>> following transition (relevant excerpt):
>>>>>>> 
>>>>>>> ===
>>>>>>>  * Pseudo action:   cl-broker-vips_stop_0
>>>>>>>  * Resource action: broker-vips:1   stop on c-pa-0
>>>>>>>  * Pseudo action:   cl-broker-vips_stopped_0
>>>>>>>  * Pseudo action:   cl-ctdb_start_0
>>>>>>>  * Resource action: ctdb            start on c-pa-1
>>>>>>>  * Pseudo action:   cl-ctdb_running_0
>>>>>>>  * Pseudo action:   cl-broker_start_0
>>>>>>>  * Resource action: ctdb            monitor=10000 on c-pa-1
>>>>>>>  * Resource action: broker          start on c-pa-1
>>>>>>>  * Pseudo action:   cl-broker_running_0
>>>>>>>  * Pseudo action:   cl-broker-vips_start_0
>>>>>>>  * Resource action: broker          monitor=10000 on c-pa-1
>>>>>>>  * Resource action: broker-vips:1   start on c-pa-1
>>>>>>>  * Pseudo action:   cl-broker-vips_running_0
>>>>>>>  * Resource action: broker-vips:1   monitor=30000 on c-pa-1
>>>>>>> ===
>>>>>>> 
>>>>>>> What could be a reason to stop unique clone instance so early for move?
>>>>>>> 
>>>>>> 
>>>>>> Do not take it as definitive answer, but cl-broker-vips cannot run
>>>>>> unless both other resources are started. So if you compute closure of
>>>>>> all required transitions it looks rather logical. Having
>>>>>> cl-broker-vips started while broker is still stopped would violate
>>>>>> constraint.
>>>>> 
>>>>> Problem is that broker-vips:1 is stopped on one (source) node 
>>>>> unnecessarily early.
>>>> 
>>>> It looks to be moving from c-pa-0 to c-pa-1
>>>> It might be unnecessarily early, but it is what you asked for... we have 
>>>> to unwind the resource stack before we can build it up.
>>> 
>>> Yes, I understand that it is valid, but could its stop be delayed until 
>>> cluster is in the state when all dependencies are satisfied to start it on 
>>> another node (like migration?)?
>> 
>> No, because "we have to unwind the resource stack before we can build it up."
>> Doing anything else would be one of those things that is trivial for a human 
>> to identify but rather complex for a computer.
> 
> I believe there is also an issue with migration of clone instances.
> 
> I modified pe-input to allow migration of cl-broker-vips (and also set inf 
> score for broker-vips-after-broker
> and make cl-broker-vips interleaved).
> Relevant part is:
> clone cl-broker broker \
>        meta interleave=true target-role=Started
> clone cl-broker-vips broker-vips \
>        meta clone-node-max=2 globally-unique=true interleave=true 
> allow-migrate=true resource-stickiness=0 target-role=Started
> clone cl-ctdb ctdb \
>        meta interleave=true target-role=Started
> colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
> colocation broker-with-ctdb inf: cl-broker cl-ctdb
> order broker-after-ctdb inf: cl-ctdb cl-broker
> order broker-vips-after-broker inf: cl-broker cl-broker-vips
> 
> After that (part of) transition is:
> 
> * Resource action: broker-vips:1   migrate_to on c-pa-0
> * Pseudo action:   cl-broker-vips_stop_0
> * Resource action: broker-vips:1   migrate_from on c-pa-1
> * Resource action: broker-vips:1   stop on c-pa-0
> * Pseudo action:   cl-broker-vips_stopped_0
> * Pseudo action:   all_stopped
> * Pseudo action:   cl-ctdb_start_0
> * Resource action: ctdb            start on c-pa-1
> * Pseudo action:   cl-ctdb_running_0
> * Pseudo action:   cl-broker_start_0
> * Resource action: ctdb            monitor=10000 on c-pa-1
> * Resource action: broker          start on c-pa-1
> * Pseudo action:   cl-broker_running_0
> * Pseudo action:   cl-broker-vips_start_0
> * Resource action: broker          monitor=10000 on c-pa-1
> * Pseudo action:   broker-vips:1_start_0
> * Pseudo action:   cl-broker-vips_running_0
> * Resource action: broker-vips:1   monitor=30000 on c-pa-1


Have you got the PE file for this?
I feel like we fixed something like this recently but I’d like to check it with 
your input.

> 
> But, I would say that at least from a human logic PoV the above breaks 
> ordering rule broker-vips-after-broker
> (cl-broker-vips finished migrating and thus runs on c-pa-1 before cl-broker 
> started there).
> Technically broker-vips:1_start_0 goes at the right position, but actually 
> resource is "started"
> in migrate_to/mifrate_from.
> 
> 
> I also went further and injected a pair of non-clone IPAddr2 resources into 
> the same pe-input, and also enabled migration
> for them (returning interleave for cl-broker-vips to false and setting 
> ordering score for broker-vips-after-broker back to 0,
> so all three order constraints are adjacent):
> 
> clone cl-broker broker \
>        meta interleave=true target-role=Started
> clone cl-broker-vips broker-vips \
>        meta clone-node-max=2 globally-unique=true interleave=false 
> allow-migrate=true resource-stickiness=0 target-role=Started
> clone cl-ctdb ctdb \
>        meta interleave=true target-role=Started
> primitive broker-vip1 IPaddr2 \
>        params ip=192.168.122.70 cidr_netmask=24 nic=eth0 \
>        op start interval=0 timeout=20 \
>        op stop interval=0 timeout=20 \
>        op monitor interval=30
> primitive broker-vip2 IPaddr2 \
>        params ip=192.168.122.71 cidr_netmask=24 nic=eth0 \
>        op start interval=0 timeout=20 \
>        op stop interval=0 timeout=20 \
>        op monitor interval=30
> colocation broker-with-ctdb inf: cl-broker cl-ctdb
> colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
> colocation broker-vip1-with-broker inf: broker-vip1 cl-broker
> colocation broker-vip2-with-broker inf: broker-vip2 cl-broker
> colocation broker-vip2-not-with-vip1 -100: broker-vip2 broker-vip1
> order broker-after-ctdb inf: cl-ctdb cl-broker
> order broker-vips-after-broker 0: cl-broker cl-broker-vips
> order broker-vip1-after-broker 0: cl-broker broker-vip1
> order broker-vip2-after-broker 0: cl-broker broker-vip2
> 
> For broker-vip2 I see completely different output (compare with 
> broker-vips:1):
> 
> * Resource action: broker-vips:1   migrate_to on c-pa-0
> * Pseudo action:   cl-broker-vips_stop_0
> * Resource action: broker-vips:1   migrate_from on c-pa-1
> * Resource action: broker-vips:1   stop on c-pa-0
> * Pseudo action:   cl-broker-vips_stopped_0
> * Pseudo action:   cl-ctdb_start_0
> * Resource action: ctdb            start on c-pa-1
> * Pseudo action:   cl-ctdb_running_0
> * Pseudo action:   cl-broker_start_0
> * Resource action: ctdb            monitor=10000 on c-pa-1
> * Resource action: broker          start on c-pa-1
> * Pseudo action:   cl-broker_running_0
> * Resource action: broker-vip2     migrate_to on c-pa-0
> * Pseudo action:   cl-broker-vips_start_0
> * Resource action: broker          monitor=10000 on c-pa-1
> * Resource action: broker-vip2     migrate_from on c-pa-1
> * Resource action: broker-vip2     stop on c-pa-0
> * Pseudo action:   broker-vips:1_start_0
> * Pseudo action:   cl-broker-vips_running_0
> * Pseudo action:   all_stopped
> * Pseudo action:   broker-vip2_start_0
> * Resource action: broker-vips:1   monitor=30000 on c-pa-1
> * Resource action: broker-vip2     monitor=30000 on c-pa-1
> 
> broker-vip2 is migrated much later than broker-vips:1, exactly at the point I 
> would expect to see.
> 
> For me that means that some logic already exists which would allow to 
> postpone resource move until
> everything is ready for it at the destination.
> 
> I also tried to disable migration for broker-vip2, and in that case it was 
> also stopped too early.
> 
> So, there are four cases, and for one of them I get expected result:
> *) g-u clone, migration disabled         - early stop
> *) g-u clone, migration enabled          - early stop
> *) ordinary resource, migration disabled - early stop
> *) ordinary resource, migration enabled  - stop at the expected point
> 
> The question is:
> 
> Is it strictly impossible to make non-migratable resources behave the same 
> way as that migratable broker-vip2?
> 
> (I'm pretty sure I didn't make a mess in details anywhere but I want to 
> recheck that all once again)
> 
> Best,
> Vladislav
> 
>> 
>> Better to look at why broker-vips:1 needed to be moved.
>> 
>>> 
>>> Like:
>>> ===
>>> * Pseudo action:   cl-ctdb_start_0
>>> * Resource action: ctdb            start on c-pa-1
>>> * Pseudo action:   cl-ctdb_running_0
>>> * Pseudo action:   cl-broker_start_0
>>> * Resource action: ctdb            monitor=10000 on c-pa-1
>>> * Resource action: broker          start on c-pa-1
>>> * Pseudo action:   cl-broker_running_0
>>> * Pseudo action:   cl-broker-vips_start_0
>>> * Resource action: broker          monitor=10000 on c-pa-1
>>> * Pseudo action:   cl-broker-vips_stop_0
>>> * Resource action: broker-vips:1   stop on c-pa-0
>>> * Pseudo action:   cl-broker-vips_stopped_0
>>> * Resource action: broker-vips:1   start on c-pa-1
>>> * Pseudo action:   cl-broker-vips_running_0
>>> * Resource action: broker-vips:1   monitor=30000 on c-pa-1
>>> ===
>>> That would be the great optimization toward five nines...
>>> 
>>> Best,
>>> Vladislav
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Unique clone instance is stopped too early on move

Reply via email to