16] HA colocation rules

Daniel Kral Fri, 25 Apr 2025 01:37:05 -0700

Thanks for the review, Fiona!

I have some comments left, one of them is about the last comment abouthow to migrate HA groups to location rules to give a better illustrationwhy I'd like to allow multiple location rules in the end, hope we'reable to do this.


On 4/24/25 12:12, Fiona Ebner wrote:

Am 25.03.25 um 16:12 schrieb Daniel Kral:

| Canonicalization
----------

Additionally, colocation rules are currently simplified as follows:

- If there are multiple positive colocation rules with common services
   and the same strictness, these are merged to a single positive
   colocation rule.


Do you intend to do that when writing the configuration file? I think
rules are better left unmerged from a user perspective. For example:

- services 1, 2 and 3 should strictly stay together, because of reason A
- services 1 and 3 should strictly stay together, because of different
reason B

Another scenario might be that the user is currently in the process of
editing some rules one-by-one and then it might also be surprising if
something is auto-merged.

You can of course always dynamically merge them when doing the
computation for the node selection.

This is what I had in mind and I should have made the description forthat clearer here. It is only for computing the feasibility of the ruleswhen (1) creating, (2) updating, and (3) applying them.

As suggested by @Lukas off-list, I'll also try to make the checkselective, e.g. the user has made an infeasible change to the configmanually by writing to the file and then wants to create another rule.Here it should ignore the infeasible rules (as they'll be droppedanyway) and only check if the added rule / changed rule is infeasible.

But as you said, it must not change the user's configuration in the endas that would be very confusing to the user.


In the same spirit, a comment field for each rule where the user can put
the reason might be nice to have.

Another question is if we should allow enabling/disabling rules.

Comment and enabling can of course always be added later. I'm just not
sure we should start out with the auto-merging of rules.

Good idea, I think there are definitely use cases for enabling/disablingthe rules and it's easy to implement, will add that to v1 :).

| Inference rules
----------

There are currently no inference rules implemented for the RFC, but
there could be potential to further simplify some code paths in the
future, e.g. a positive colocation rule where one service is part of a
restricted HA group makes the other services in the positive colocation
rule a part of this HA group as well.


If the rule is strict. If we do this I think it should only happen
dynamically for the node selection too.

Yes, I'll take a closer look here, but I fully agree that this partshould also be done dynamically as the steps above.

I'll see if that could improve something and wouldn't be unnecessaryoverhead that will be handled by the node selection in the end anyway.Roughly speaking, I'd like the select_service_node(...) to mostlyconsist of the steps (as already done now with HA groups):


apply_location_rules(...);
apply_colocation_rules(...);

$scores = score_nodes_to_start_service(...);
/* select the best node according to utilization) */

Comment about HA groups -> Location Rules
-----------------------------------------

This part is not really part of the patch series, but still worth for an
on-list discussion.

I'd like to suggest to also transform the existing HA groups to location
rules, if the rule concept turns out to be a good fit for the colocation
feature in the HA Manager, as HA groups seem to integrate quite easily
into this concept.

This would make service-node relationships a little more flexible for
users and we'd be able to have both configurable / visible in the same
WebUI view, API endpoint, and configuration file. Also, some code paths
could be a little more consise, e.g. checking changes to constraints and
canonicalizing the rules config.

The how should be rather straightforward for the obvious use cases:

- Services in unrestricted HA groups -> Location rules with the nodes of
   the HA group; We could either split each node priority group into
   separate location rules (with each having their score / weight) or
   keep the input format of HA groups with a list of
   `<node>(:<priority>)` in each rule

- Services in restricted HA groups -> Same as above, but also using
   either `+inf` for a mandatory location rule or `strict` property
   depending on how we decide on the colocation rule properties


I'd prefer having a 'strict' property, as that is orthogonal to the
priorities and that aligns it with what you propose for the colocation
rules.

ACK, I forgot to remove this bit as I dropped the idea of the flat'score' property.

This would allow most of the use cases of HA groups to be easily
migratable to location rules. We could also keep the inference of the
'default group' for unrestricted HA groups (any node that is available
is added as a group member with priority -1).


Nodes can change, so adding them explicitly will mean it can get
outdated. This should be implicit/done dynamically.

Yes, I should have stated this more clearly: it was meant to bedynamically inferred instead of statically written to the config as thiswould only clutter the config with "useless" rules for any service thatdidn't have a HA group before ;).

The only thing that I'm unsure about this, is how we would migrate the
`nofailback` option, since this operates on the group-level. If we keep
the `<node>(:<priority>)` syntax and restrict that each service can only
be part of one location rule, it'd be easy to have the same flag. If we
go with multiple location rules per service and each having a score or
weight (for the priority), then we wouldn't be able to have this flag
anymore. I think we could keep the semantic if we move this flag to the
service config, but I'm thankful for any comments on this.

My gut feeling is that going for a more direct mapping, i.e. each
location rule represents one HA group, is better. The nofailback flag
can still apply to a given location rule I think? For a given service,
if a higher-priority node is online for any location rule the service is
part of, with nofailback=0, it will get migrated to that higher-priority
node. It does make sense to have a given service be part of only one
location rule then though, since node priorities can conflict between rules.


Yeah, I think this is the reasonable option too.

I briefly discussed this with @Fabian off-list and we also agreed thatit would be good to make location rules as 1:1 to location rules aspossible and keep the nofailback per location rule, as the behavior ofthe HA group's nofailback could still be preserved - at least if there'sonly a single location rule per service at least.

---

On the other hand, I'll have to take a closer look if we can dosomething about the blockers when creating multiple location rules wheree.g. one has nofailback enabled and the other has not. As you alreadysaid, they could easily conflict between rules...

My previous idea was to make location rules as flexible as possible, sothat it would theoretically not matter if one writes:


location: rule1
    services: vm:101
    nodes: node1:2,node2:1
    strict: 1

or:

location: rule1
    services: vm:101
    nodes: node1
    strict: 1

location: rule2
    services: vm:101
    nodes: node2
    strict: 1

The order which one's more important could be encoded in the order whichit is defined (if one configures this in the config it's easy, and I'dadd an API endpoint to realize this over the API/WebGUI too), or maybeeven simpler to maintain: just another property. But then, thenofailback would have to be either moved to some other place...

Or it is still allowed in location rules, but either the more detailedrule wins (e.g. one rule has node1 without a priority and the other doeshave node1 with a priority) or the first location rule with a specificnode wins and the other is ignored. But this is already confusing whenwriting it out here...

I'd prefer users to write the former (and make this the dynamic'canonical' form when selecting nodes), but as with colocation rules itcould make sense to separate them for specific reasons / use cases.

And another reason why it could still make sense to go that way is toallow "negative" location rules at a later point, which makes sense inlarger environments, where it's easier to write opt-out rules thanopt-in rules, so I'd like to keep that path open for the future.



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules

Reply via email to