[Pacemaker] start/stop operations fail to happen in parallel on resources

Parshvi Thu, 19 Apr 2012 04:25:56 -0700

Observations:
max-children=30
total no. of resources=18

1) At a default value 4 of max-children, following logs were observed 
that led to monitor op’s timeout for some resources (a total of 18 rscs):
  a. “max_child_count (4) reached, postponing execution of operation monitor”
  b. “WARN: perform_ra_op: the operation operation monitor[18] on 
ocf::IPaddr2::ClusterIP for client 3754, stayed in operation list for 
14100 ms (longer than 10000 ms)”
  c. SOLUTION: the max-children of lrmd was raised to 30.
  d. ISSUES STILL OBSERVED: while 2-3 resources are stuck in start operation, 
if a rsc is issued an explicit start command `crm resource start rcs1`, then 
the 
start op on this rsc is delayed until any one of the previous resources exit 
from their start operation.




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] start/stop operations fail to happen in parallel on resources

Reply via email to