16] HA colocation rules

Daniel Kral Tue, 01 Apr 2025 13:39:40 -0700

On 4/1/25 03:50, DERUMIER, Alexandre wrote:

my 2cents, but everybody in the industry is calling this
affinity/antiafifnity (vmware, nutanix, hyperv, openstack, ...).
More precisely, vm affinity rules (vm<->vm)   vs  node affinity rules
(vm->node , the current HA group)


Personnally I don't care, it's just a name ^_^ .

But I have a lot  of customers asking about "does proxmox support
affinity/anti-affinity". and if they are doing their own research, they
will think that it doesnt exist.
(or at minimum, write  somewhere in the doc something like "aka vm
affinity" or in commercial presentation ^_^)

I see your point and also called it affinity/anti-affinity before, butif we go for the HA Rules route here, it'd be really neat to have"Location Rules" and "Colocation Rules" in the end to coexist andclearly show the distinction between them, as both are affinity rules atleast for me.

I'd definitely make sure that it is clear from the release notes anddocumentation, that this adds the feature to assign affinity betweenservices, but let's wait for some other comments on this ;).


On 4/1/25 03:50, DERUMIER, Alexandre wrote:

More serious question : Don't have read yet all the code, but how does
it play with the current topsis placement algorithm ?

I currently implemented the colocation rules to put a constraint onwhich nodes the manager can select from for the to-be-migrated service.

So if users use the static load scheduler (and the basic / service countscheduler for that matter too), the colocation rules just make sure thatno recovery node is selected, which contradicts the colocation rules. Sothe TOPSIS algorithm isn't changed at all.

There are two things that should/could be changed in the future (besidesthe many future ideas that I pointed out already), which are

- (1) the schedulers will still consider all online nodes, i.e. eventhough HA groups and/or colocation rules restrict the allowed nodes inthe end, the calculation is done for all nodes which could besignificant for larger clusters, and

- (2) the service (generally) are currently recovered one-by-one in abest-fit fashion, i.e. there's no order on the service's neededresources, etc. There could be some edge cases (e.g. think about afailing node with a bunch of service to be kept together; these shouldnow be migrated to the same node, if possible, or put them on theminimum amount of nodes), where the algorithm could find bettersolutions if it either orders the to-be-recovered services, and/or theutilization scheduler has knowledge about the 'keep together'colocations and considers these (and all subsets) as a single service.

For the latter, the complexity explodes a bit and is harder to test for,which is why I've gone for the current implementation, as it alsoreduces the burden on users to think about what could happen with aspecific set of rules and already allows the notion of MUST/SHOULD. Thisgives enough flexibility to improve the decision making of the schedulerin the future.


On 4/1/25 03:50, DERUMIER, Alexandre wrote:

Small feature request from students && customers:  they are a lot
asking to be able to use vm tags in the colocation/affinity

Good idea! We were thinking about this too and I forgot to add it to thelist, thanks for bringing it up again!

Yes, the idea would be to make pools and tags available as selectors forrules here, so that the changes can be made rather dynamic by justadding a tag to a service.

The only thing we have to consider here is that HA rules have someverification phase and invalid rules will be dropped or modified to makethem applicable. Also these external changes must be identified somehowin the HA stack, as I want to keep the amount of runs through theverification code to a minimum, i.e. only when the configuration ischanged by the user. But that will be a discussion for another series ;).



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules

Reply via email to