On 11/09/2013, at 9:33 PM, Lars Marowsky-Bree <l...@suse.com> wrote:
> On 2013-09-11T19:55:38, Andrew Beekhof <and...@beekhof.net> wrote: > >>> sorry for being thick, but I can't find this in the code now. Did this >>> slip through again in April? >> Apparently. But before we add it, I'd like to see if we can do something >> coherent. >> Having 3 (or more) different variables (batch-limit, migration-limit and >> this) for controlling these things doesn't seem optimal or user friendly. > > Well, they're all doing something completely different. No, they're all crude approximations designed to stop the cluster as a whole from using up so much cpu/network/etc that recovery introduces more failures than it resolves. > > A cluster-wide limit on operations (batch-limit) limits the total > cluster and network/storage load. > > The max_children prevent a given node from being overloaded by > concurrent operations. At the expense of introducing other failures... such as "I fired off an action N seconds ago with a timeout < N and still haven't heard back" which was possible if batch-limit and max children were too out of balance. Which is why any limiting needs to happen at centrally on the DC. > (Reducing batch-limit to emulate this kills > cluster-wide parallelism and is not optimal.) Clearly, it's not perfect > either (since it assumes all rsc ops on a node are identical in > weight; whereas in reality we may want to limit VM start-up to 4, but > would happily see 32 IP addresses go up at once, or 48 monitors ...), > but it is an appropriate simplification. > > migration-limit is indeed a special case (needed to limit nodes from > being overloaded by migrate, which were at the time the only ops that > affect two nodes at once - batch-limit="4" was too coarse a hammer). I > do recall that we discussed making it more generic - so that one could > configure cluster-/node-wide limits for certain operations of specific > resource types, but that was (rightly) judged to be a rather complex can > of worms by you. > >> If anything, we should likely be putting work into auto-tuning this >> stuff instead. Somehow. > > I'm not sure about how batch-limit can be auto-tuned. If the cib's CPU usage starts going too high, its time to lower the limit. Should be possible on linux. > migration-threshold is mostly a function of the network bandwidth, too. > > MAX_CHILDREN did, sort of, auto-tune (by defaulting to number of cores, > or something similar, which was appropriate enough[1]). > > It can all be made into a generic, powerful, flexible mechanism that > describes them all. But I'm afraid that it'd also be quite complex. I'm > happy to think about it, but the three limits we have/had seemed > sufficient for the real-world. > > > Regards, > Lars > > [1] the main complaint was that it was configured via sysconfig, and not > dynamic via a node attribute as it should be. When we reintroduce it, we > may want to make nodes default to PCMK/LRMD_MAX_CHILDREN if unset in > the CIB, and otherwise have that value override the environment > variable? That'd be a benefit now that pcmk and lrmd are more closely > married. As above, the rate limiting needs to happen on the DC which lends itself to being a property of the cib and/or transition graph rather than defined in sysconfig. > > > -- > Architect Storage/HA > SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, > HRB 21284 (AG Nürnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org