Hi, On Mon, Sep 30, 2013 at 02:49:02PM +0200, Moullé Alain wrote: > Hi, > > sorry for the delay on this thread, I was unavailable a few weeks, > but just FYI, I wanted to share some results I got a few weeks ago: > > I've tried some tests on a configuration and start/stop of 500 Dummy > resources, and I got these time values : > > 1/ configuration with successive crm commands "crm configure > primitive ..." :**it takes about 1H so it is not usable > 2/ with a unique crm command "crm configure < File " with all dummy > primitives in File : it takes 7s / that's OK
It is to be expected that the first way takes such a long time. crm configure invokes cibadmin a few times and it takes a while for each commit. If you have more than one thing to tell it, best to do all of that at once. However, just configuring the resources doesn't mean that they all got started. > 3/ add just one location constraint for each dummy primitive with > "crm configure < File" with all constraints in File : it takes 27s / > strange but acceptable > 4/ Start of the 500 primitives with successive crm commands "crm > resource start ..." :it takes 7mn28 / seems not acceptable > moreover for dummy resources ... > 5/ Start of the 500 primitives with parallel (background) crm > commands "crm resource start ... &" : not possible, lots of commands > exit in errors and anyway it takes also long time > 6/ Start of the 500 primitives in parallel by setting all > target-roles to "Started" in Pacemaker: > => with crm configure edit : s/Stopped/Started on the 500 primitives > Result : around6 mn for all primitives to be started. Seems not > acceptable moreover for dummy resources , and it will take let's say > about 3 mn for a failover if primitives are > well located half on one node and half on the other. > > These results are with dummy resources, and we can imagine that with > real resources it will take much longer, not speaking about the > periodic monitoring of 500 primitives ... Of course, it'd take much longer, depending on the applications. > So, based on these results, I think that the limit in number of > resources is far below 500 resources ... > > But I wanted to give these results just to keep going on this > subject and perhaps get some ideas ... Which version of Pacemaker did you test with? Did you try to adjust properties/parameters which influence how many resource actions can be executed in parallel (LRM_MAX_CHILDREN, batch-limit)? Thanks, Dejan > Thanks > Alain > > > > Le 05/09/2013 10:58, Lars Marowsky-Bree a écrit : > >On 2013-09-04T08:26:14, Ulrich Windl <[email protected]> > >wrote: > > > >>In my experience network traffic grows somewhat linear with the size > >>of the CIB. At some point you probably have to change communication > >>parameters to keep the cluster in a happy comminication state. > >Yes, I wish corosync would "auto-tune" to a higher degree. Apparently > >though, that's a slightly harder problem. > > > >We welcome any feedback on required tunables. Those that we ship on SLE > >HA worked for us (and even for rather largeish configurations), but they > >may not be appropriate everywhere. > > > >>Despite of the cluster internals, there may be problems if a node goes > >>online and hundreds of resources are started in parallel, specifically > >>if those resources weren't designed for it. I suspect IP addresses, > >>MD-RAIDs, LVM stuff, drbd, filesystems, exportfs, etc. > >No, most of these resource scripts *are* supposed to be > >concurrency-safe. If you find something that breaks, please share the > >feedback. > > > >It's true that the way how concurrent load limitation is implemented in > >Pacemaker/LRM isn't perfect yet. batch-limit is rather coarse. The > >per-node LRM child limit is probably the best bet right now. But it > >doesn't differentiate between starting many light-weight resources in > >parallel (such as IPaddr) versus heavy-weights (VMs with Oracle > >databases). > > > >(migration-threshold goes in the same direction.) > > > >Historical context matters. Pacemaker comes from the HA world; we still > >believe 3-7 node clusters are the largest anyone ought to reasonably > >build, considering the failure/admin/security domain issues with single > >point of failures and the increasing likelihood of double failures etc. > > > >But there's several trends - > > > >Even those 3-7 nodes become increasingly powerful multi-core kick-ass > >boxes. 7 nodes might well host hundreds of resources nowadays (say, > >above 70 VMs with all their supporting resources). > > > >People build much larger clusters because there's no good way to "divide > >and conquer" yet - e.g., if you build several 3 or 5 node clusters, > >there's no support for managing those clusters-of-clusters. > > > >And people use Pacemaker for HPC style deployments (e.g., private > >clouds with tons of VMs) - because while our HPC support is suboptimal, > >it is better than the HA support in most of the Cloud offerings. > > > > > >>As a note: Just recently we had a failure in MD-RAID activation with no real > >>reason to be found in syslog, and the cluster got quite confused. > >>(I had reported this to my favourite supporter (SR 10851868591), but haven't > >>heard anything since then...) > >I'll try to dig that out of the support system and give it a look. > > > > > >Regards, > > Lars > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
