16] HA colocation rules

Fiona Ebner Fri, 25 Apr 2025 05:26:40 -0700

Am 25.04.25 um 10:36 schrieb Daniel Kral:
> On 4/24/25 12:12, Fiona Ebner wrote:
>> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>>> | Canonicalization
>>> ----------
>>>
>>> Additionally, colocation rules are currently simplified as follows:
>>>
>>> - If there are multiple positive colocation rules with common services
>>>    and the same strictness, these are merged to a single positive
>>>    colocation rule.
>>
>> Do you intend to do that when writing the configuration file? I think
>> rules are better left unmerged from a user perspective. For example:
>>
>> - services 1, 2 and 3 should strictly stay together, because of reason A
>> - services 1 and 3 should strictly stay together, because of different
>> reason B
>>
>> Another scenario might be that the user is currently in the process of
>> editing some rules one-by-one and then it might also be surprising if
>> something is auto-merged.
>>
>> You can of course always dynamically merge them when doing the
>> computation for the node selection.
> 
> This is what I had in mind and I should have made the description for
> that clearer here. It is only for computing the feasibility of the rules
> when (1) creating, (2) updating, and (3) applying them.


Okay, great :) Just wanted to make sure.

> As suggested by @Lukas off-list, I'll also try to make the check
> selective, e.g. the user has made an infeasible change to the config
> manually by writing to the file and then wants to create another rule.
> Here it should ignore the infeasible rules (as they'll be dropped
> anyway) and only check if the added rule / changed rule is infeasible.

How will you select the rule to drop? Applying the rules one-by-one to
find a first violation?

> But as you said, it must not change the user's configuration in the end
> as that would be very confusing to the user.

Okay, so dropping dynamically. I guess we could also disable such rules
explicitly/mark them as being in violation with other rules somehow:
Tri-state enabled/disabled/conflict status? Explicit field?

Something like that would make such rules easily visible and have the
configuration better reflect the actual status.

As discussed off-list now: we can try to re-enable conflicting rules
next time the rules are loaded.

>>> The only thing that I'm unsure about this, is how we would migrate the
>>> `nofailback` option, since this operates on the group-level. If we keep
>>> the `<node>(:<priority>)` syntax and restrict that each service can only
>>> be part of one location rule, it'd be easy to have the same flag. If we
>>> go with multiple location rules per service and each having a score or
>>> weight (for the priority), then we wouldn't be able to have this flag
>>> anymore. I think we could keep the semantic if we move this flag to the
>>> service config, but I'm thankful for any comments on this.
>> My gut feeling is that going for a more direct mapping, i.e. each
>> location rule represents one HA group, is better. The nofailback flag
>> can still apply to a given location rule I think? For a given service,
>> if a higher-priority node is online for any location rule the service is
>> part of, with nofailback=0, it will get migrated to that higher-priority
>> node. It does make sense to have a given service be part of only one
>> location rule then though, since node priorities can conflict between
>> rules.
> 
> Yeah, I think this is the reasonable option too.
> 
> I briefly discussed this with @Fabian off-list and we also agreed that
> it would be good to make location rules as 1:1 to location rules as
> possible and keep the nofailback per location rule, as the behavior of
> the HA group's nofailback could still be preserved - at least if there's
> only a single location rule per service at least.
> 
> ---
> 
> On the other hand, I'll have to take a closer look if we can do
> something about the blockers when creating multiple location rules where
> e.g. one has nofailback enabled and the other has not. As you already
> said, they could easily conflict between rules...
> 
> My previous idea was to make location rules as flexible as possible, so
> that it would theoretically not matter if one writes:
> 
> location: rule1
>     services: vm:101
>     nodes: node1:2,node2:1
>     strict: 1
> or:
> 
> location: rule1
>     services: vm:101
>     nodes: node1
>     strict: 1
> 
> location: rule2
>     services: vm:101
>     nodes: node2
>     strict: 1
> 
> The order which one's more important could be encoded in the order which
> it is defined (if one configures this in the config it's easy, and I'd
> add an API endpoint to realize this over the API/WebGUI too), or maybe
> even simpler to maintain: just another property.

We cannot use just the order, because a user might want to give two
nodes the same priority. I'd also like to avoid an implicit
order-priority mapping.

> But then, the
> nofailback would have to be either moved to some other place...

> Or it is still allowed in location rules, but either the more detailed
> rule wins (e.g. one rule has node1 without a priority and the other does
> have node1 with a priority)

Maybe we should prohibit multiple rules with the same service-node pair?
Otherwise, my intuition says that all rules should be considered and the
rule with the highest node priority should win.

> or the first location rule with a specific
> node wins and the other is ignored. But this is already confusing when
> writing it out here...
>
> I'd prefer users to write the former (and make this the dynamic
> 'canonical' form when selecting nodes), but as with colocation rules it
> could make sense to separate them for specific reasons / use cases.

Fair point.

> And another reason why it could still make sense to go that way is to
> allow "negative" location rules at a later point, which makes sense in
> larger environments, where it's easier to write opt-out rules than opt-
> in rules, so I'd like to keep that path open for the future.

We also discussed this off list: Daniel convinced me that it would be
cleaner if the nofailback property would be associated to a given
service rather than a given location rule. And if we later support pools
as resources, the property should be associated to (certain or all)
services in that pool and defined in the resource config for the pool.

To avoid the double-negation with nofailback=0, it could also be renamed
to a positive property, below called "auto-elevate", just a working name.

A small concern of mine was that this makes it impossible to have a
service that only "auto-elevates" to a specific node with a priority,
but not others. This is already not possible right now, and honestly,
that would be quite strange behavior and not supporting that is unlikely
to hurt real use cases.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules

Reply via email to