On 2009-10-30T19:41:35, Yan Gao <y...@novell.com> wrote: Hi Yan Gao,
excellent! Before reviewing the code, lets review the interface/configuration though. > User case: > Xen guests have memory requirements; nodes cannot host more guests than > the node has physical memory installed. > > > Configuration example: > > node yingying \ > attributes capacity="100" > primitive dummy0 ocf:heartbeat:Dummy \ > meta weight="90" priority="2" > primitive dummy1 ocf:heartbeat:Dummy \ > meta weight="60" priority="1" > .. > property $id="cib-bootstrap-options" \ > limit-capacity="true" First, I would prefer not to contaminate the regular node attribute namespace; the word "capacity" might already be used. Second, the "weight" is just one dimension, which is somewhat difficult. I'd propose to introduce a new XML element, "resource_utilization" (name to be decided ;-) containing a "nvset", and which can be used in a node element or a resource primitive. This creates a new namespace, avoiding clashes, and distinguishes the utilization parameters from the other various attributes. Further, it trivially allows for several user-defined metrics. node hex-0 \ utilization memory="4096" cpu="8" ... primitive dummy0 ocf:heartbeat:Dummy \ meta priority="2" utilization memory="2048" cpu="2" primitive dummy1 ocf:heartbeat:Dummy \ utilization memory="3012" primitive dummy2 ocf:heartbeat:Dummy \ utilization cpu="6" dummy0 + dummy2 could both be placed on hex-0, or dummy1+dummy2, but not dummy0 + dummy1. "Placement allowed where none of the utilization parameters would become negative." (ie, iterate over the utilization attributes specified for the resource.) > If we don't want to enable capacity limit. We could set property > "limit-capacity" to "false", or default it. Right, a cluster property to globally disable/enable this is a very good idea. > I also noticed a likely similar planned feature described in > http://clusterlabs.org/wiki/Planned_Features > > "Implement adaptive service placement (based on the RAM, CPU etc. > required by the service and made available by the nodes) " > > Indeed, this try only supports single kind of capacity, and it's not > adaptive... Do you already have a thorough consideration about this > feature? I think this is a two phase feature for the PE: The first phase is what you propose - make sure we do not overload any given node, basically implementing hard limits. The second phase would be for the PE to actually try to "optimize" placement, and try to solve the constraints imposed by the utilization versus capacity scores to a) place as many resources as possible successfully, and b) to either spread them thinly (load distribution) or condensed (load concentration, think power savings by being able to put some nodes to sleep). The first phase should, IMHO, be quite easy to implement. The second one is significantly more difficult, and we'd need to pull in an optimization library to solve this for us. It's conceivable that for this to happen, we'd need to disable the normal "rsc_location" rules altogether because they'd interfere badly. (And interesting to note that the rsc_collocation constraints can be mapped into this scheme and entirely handled by this solver.) There is the "adaptive" bit, of course, where the utilization of the resources and the nodes is automatically determined and adjusted based on utilization monitoring. This is even more challenging and frequently considered a research problem. In summary, I think phase one is urgently needed; thankfully, it is straightforward to solve too, and the admin can influence placement with priorities and scoring sufficiently to avoid resources being offlined due to resource collisions too frequently. Phase two is a "solved problem" from an algorithmic point of view, but implementing it is probably not quite as trivial. I'd welcome to see this happening too. Adaptive placement ... anyone who wants to write a master or phd thesis around? ;-) Best, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker