Observations: max-children=30 total no. of resources=18 1) At a default value 4 of max-children, following logs were observed that led to monitor op’s timeout for some resources (a total of 18 rscs): a. “max_child_count (4) reached, postponing execution of operation monitor” b. “WARN: perform_ra_op: the operation operation monitor[18] on ocf::IPaddr2::ClusterIP for client 3754, stayed in operation list for 14100 ms (longer than 10000 ms)” c. SOLUTION: the max-children of lrmd was raised to 30. d. ISSUES STILL OBSERVED: while 2-3 resources are stuck in start operation, if a rsc is issued an explicit start command `crm resource start rcs1`, then the start op on this rsc is delayed until any one of the previous resources exit from their start operation.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org