Re: [Pacemaker] start/stop operations fail to happen in parallel on resources

Andrew Beekhof Wed, 09 May 2012 21:18:49 -0700

On Fri, Apr 20, 2012 at 12:30 AM, David Vossel <dvos...@redhat.com> wrote:
> ----- Original Message -----
>> From: "Parshvi" <parshvi...@gmail.com>
>> To: pacema...@clusterlabs.org
>> Sent: Thursday, April 19, 2012 6:22:01 AM
>> Subject: [Pacemaker] start/stop operations fail to happen in parallel on     
>>  resources
>>
>> Observations:
>> max-children=30
>> total no. of resources=18
>>
>> 1) At a default value 4 of max-children, following logs were observed
>> that led to monitor op’s timeout for some resources (a total of 18
>> rscs):
>>   a. “max_child_count (4) reached, postponing execution of operation
>>   monitor”
>>   b. “WARN: perform_ra_op: the operation operation monitor[18] on
>> ocf::IPaddr2::ClusterIP for client 3754, stayed in operation list for
>> 14100 ms (longer than 10000 ms)”
>>   c. SOLUTION: the max-children of lrmd was raised to 30.
>>   d. ISSUES STILL OBSERVED: while 2-3 resources are stuck in start
>>   operation,
>> if a rsc is issued an explicit start command `crm resource start
>> rcs1`, then the
>> start op on this rsc is delayed until any one of the previous
>> resources exit
>> from their start operation.
>>
>
> This is what I would expect to happen.  If a operation is in flight at the 
> same time you make a configuration change, I don't believe the change will be 
> looked at until the operation returns or times out.


Correct.  We wait for any in-flight operations to complete but do not
initiate any more.
You can also set batch-limit to prevent pacemaker from sending "too
many" operations to the lrmd in the first place, but setting
max-children to 30 on a decent machine doesn't seem unreasonable.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] start/stop operations fail to happen in parallel on resources

Reply via email to