Hi, On Thu, Jan 5, 2012 at 6:43 PM, Graantik <graan...@gmail.com> wrote: > Hi all, > > I have a task that I think can logically be implemented using a > pacemaker/corosync cluster with many nodes (e.g. 15) and maybe thousand or > more resources. Most of the resources are parametrized processes controlled > by a custom resource agent. The resources are added and removed dynamically, > typically many (e.g. 100) at one time. > > My first tests in a VM environment show that - even after some tuning of > lrmd max-children and custom-batch-limit, optimizing the RA and having the > processes idle - adding so many resources in one step (xml based) appears to > bring the cluster to its knees, i.e. nodes become unresponsive, DC and other > nodes have very high load, and the operation takes an hour or longer. > > Does this mean that the design limit of this software/hardware is reached or > are there ways like tuning or best practices to make such a scenario work?
In terms of performance testing on large clusters there is an article that may be interesting to read http://theclusterguy.clusterlabs.org/post/1241986422/large-cluster-performance In the article it talks about using 10000 resources, so it's higher than your use case, you can take away from it the timings that you have had and the ones presented there and go from there. Bare in mind that when dealing with so many resources and nodes it might help to tweak certain things, such as the maximum message size for corosync (the article mentions using 256k), timeouts in corosync token might have to be increased, as high load on the systems may delay replies in network traffic, and also having to sync the CIB onto ~15 nodes as you mentioned means that you _should_ use multicast, switches must support igmp snooping and have it enabled and properly configured, the entire cluster should be in a separate vlan, or have some form of dedicated network, to ensure not only throughput but also latency and to prevent interference of other network traffic, etc. > > Are there known implementations of comparable size? In terms of nodes, most I know of are clusters of ~10-12 nodes, in terms of resources, not that I know of. HTH, Dan > > Thanks > Gerhard > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Dan Frincu CCNA, RHCE _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org