On 2013-06-05T20:44:56, Michael Schwartzkopff <mi...@clusterbau.com> wrote:
Hi Michael, yes, the idea to make utilization more dynamic was something Andrew and I looked into ages ago. Especially, there's still the open issue that it somewhat sucks that one has to configure them at all. It'd be nice if monitor_0 would "discover" the memory/CPU values from the VM (for example) and populate the CIB accordingly. And to keep those in-sync. Pacemaker is not necessarily the best tool to implement quick reaction to changing load, though. The utilization feature is concerned with *correctness* first - namely, don't overcommit resources severely, e.g., the case of Xen/VMs in general, don't overcommit physical memory (which could even prevent resources from starting at all), or making sure there's at least 0.5 CPU cores available per VM, etc. Without having the admin having to figure out the node scores manually. Ease of configuration and all that. Some constructive feedback: The dampening in your approach isn't sufficient. This could potentially cause a reshuffling of resources with every update; even taking into account that this is possible using live migration, it's going to be a major performance impact. I think what you want instead are thresholds; only if the resource utilization stays above XXX for YYY times, update the CIB, so that the service can be moved to a more powerful server. Lower requirements again if the system is below a fall-back threshold for a given period. You want to minimize movement. And to add a scale factor so you can allow for some overcommit if desired. [*] You also want that because you want to avoid needless PE runs. In your current example, you're going to cause a PE run for *every* *single* monitor operation on *any* VM. And, of course, this should be optional and protected via a configuration parameter. But, the real issue: CPU utilization raising is only a problem if the service performance suffers in turn. Basically, you don't want to move resources because their CPU utilization rises, but when the performance of the services hosted on a node degrade. Hence, I'd agree that the dynamic load adjustment best should live outside Pacemaker. At the very least, you'd want to synchronize updating the load factors of all the VMs at once, so that the PE can shuffle them once, not repeatedly. While the data gathering (* as outlined above) could happen in the RA, I think you need to involve at least something like attrd in dampening them. You don't want each RA to implement a threshold/stepping logic independently. Note that all our normal probes - including the nagios ones - are concerned with a "healthy"/"failed" dichotomoy only too. They don't really offer SLA/response time data, short of 'well duh I timed out'. This could be something worth adding to a consolidated framework ("yellow" - move me somewhere else, I'm out of resources here). I have this impression you'd quickly end up implementing something close to heat/openstack then. Not that I'm opposed to that ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org