Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes

Daniel Kral Wed, 07 May 2025 01:32:06 -0700

On 5/2/25 11:33, Fiona Ebner wrote:

Am 30.04.25 um 13:09 schrieb Daniel Kral:

On 3/25/25 16:12, Daniel Kral wrote:

   sub select_service_node {
-    my ($groups, $online_node_usage, $sid, $service_conf,
$current_node, $try_next, $tried_nodes, $maintenance_fallback,
$best_scored) = @_;
+    # TODO Cleanup this signature post-RFC
+    my ($rules, $groups, $online_node_usage, $sid, $service_conf,
$current_node, $try_next, $tried_nodes, $maintenance_fallback,
$best_scored) = @_;


I'm currently trying to clean up the helper's signature here, but doing
something like

sub select_service_node {
     my ($service_info, $affinity_info, $try_next, $best_scored) = @_;

     my ($sid, $service_conf, $current_node) = $service_info->@{qw(sid
config current_node)};
     my ($rules, $groups, $online_node_usage, $tried_nodes,
$maintenance_fallback) =
     $affinity_info->@{qw(rules groups online_node_usage failed_nodes
maintenance_node)};

would require us to create helper structures on all four call sites (one
of them is just the test case ./test_failover1.pl), or introduce another
helper to just create them for passing it here and immediately de-
structuring it in select_service_node(...):

sub get_service_affinity_info {
     my ($self, $sid, $cd, $sd) = @_;

     my $service_info = {
     sid => $sid,
     config => $cd,
     current_node => $sd->{node},
     };

     my $affinity_info = {
     rules => $self->{rules},
     groups => $self->{groups},
     failed_nodes => $sd->{failed_nodes},
     maintenance_node => $sd->{maintenance_node},
     online_node_usage => $self->{online_node_usage},
     };

     return ($service_info, $affinity_info);
};

Also the call site in next_state_recovery(...) does not pass $sd-

{failed_nodes}, $sd->{maintenance_node} and $best_scored to it. AFAICS

$sd->{failed_nodes} should be undef in next_state_recovery(...) anyway,
but I feel like I have missed some states it could be in there. And $sd-

{maintenance_node} could be set anytime.


I think it makes sense to have it explicitly (rather than just
implicitly) opt-out of $try_next just like the caller for
rebalance_on_request_start. Without $try_next, the $tried_nodes
parameter does not have any effect (the caller for
rebalance_on_request_start passes it, but select_service_node() won't
read it if $try_next isn't set).

The caller in next_state_recovery() should also pass $best_scored IMHO,
so that it is fully aligned with the caller for
rebalance_on_request_start. It won't be an actual change result-wise,
because for recovery, the current node is not available, so it already
cannot be the result, but it makes sense semantically. We want the best
scored node for recovery. And having the two callers look the same is a
simplification too.

Yes, I put a patch in-between so that all the arguments are madeexplicit before there's any refactoring. Then the commit message canhold some rationale why for reviewers and future reference.

In the same spirit I'd also like to make it more explicit that fornext_state_recovery() it can have $current_node and $maintenance_nodeset, but they're irrelevant as those will be invalid nodes to selectanyway, i.e. $group_members->{$node} and $pri_nodes->{$node} are bothfalse as both contain only available nodes in the first place, if I'mnot mistaken.

If there's nothing speaking against that, I'd prefer to elevate
select_service_node(...) to be a method as it needs quite a lot of state
anyway, especially as we will need global information about other
services than just the current one in the future anyway.


I don't have strong feelings about this, both approaches seem fine to me.

So, I'd do something like

sub select_service_node {
     my ($self, $sid, $service_conf, $sd, $mode) = @_;

     my ($rules, $groups, $online_node_usage) = $self->@{qw(rules groups
online_node_usage)};


If we don't want to make it a method, we could still pass these ones
separately. After implementing location rules, $groups would be dropped
anyways.

Right, then I'll go for keeping it as a helper as this wouldn't be amethod in its own right anyway.

     my ($current_node, $tried_nodes, $maintenance_fallback) = $self-

@{qw(node failed_nodes maintenance_node)};


It's $sd->... here.


ACK.

here. It's not fancy as in there's a well-defined interface one can
immediately see what this helper needs (as it has access to the whole
$self) and doesn't have the guarantees of a standalone helper (won't
touch $self), but I think it could be better than creating helper
structures which are only pass a message, which is immediately
destructured anyway. We could also just pass $self slightly differently,
but I don't see much difference there.

The $mode could then be a enumeration of e.g. whether $try_next (e.g.
'try_again') or $best_scored (e.g. 'rebalance') is used (and can be


Having a mode sounds good to me. I don't think it should be called
'rebalance', the best-scored semantics should apply to recovery too, see
above.

Right, it wouldn't sound right there then. I think I'll just simply gowith 'try-next' and 'best-score' for both ;).

The only reference to the previous $best_scored would be dropped inselect_service_node() as both states are now mutually exclusive, whichalso shows that setting $mode = 'best-score' for next_state_recovery()shouldn't change the result.



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes

Reply via email to