Am 25.03.25 um 16:12 schrieb Daniel Kral: > Add a mechanism to the node selection subroutine, which enforces the > colocation rules defined in the rules config. > > The algorithm manipulates the set of nodes directly, which the service > is allowed to run on, depending on the type and strictness of the > colocation rules, if there are any. > > This makes it depend on the prior removal of any nodes, which are > unavailable (i.e. offline, unreachable, or weren't able to start the > service in previous tries) or are not allowed to be run on otherwise > (i.e. HA group node restrictions) to function correctly. > > Signed-off-by: Daniel Kral <d.k...@proxmox.com> > --- > src/PVE/HA/Manager.pm | 203 ++++++++++++++++++++++++++++++++++++- > src/test/test_failover1.pl | 4 +- > 2 files changed, 205 insertions(+), 2 deletions(-) > > diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm > index 8f2ab3d..79b6555 100644 > --- a/src/PVE/HA/Manager.pm > +++ b/src/PVE/HA/Manager.pm > @@ -157,8 +157,201 @@ sub get_node_priority_groups { > return ($pri_groups, $group_members); > } >
I feel like these helper functions should rather go into the colocation plugin or some other module to not bloat up Manager.pm more. > +=head3 get_colocated_services($rules, $sid, $online_node_usage) > + > +Returns a hash map of all services, which are specified as being in a > positive > +or negative colocation in C<$rules> with the given service with id C<$sid>. > + > +Each service entry consists of the type of colocation, strictness of > colocation > +and the node the service is currently assigned to, if any, according to > +C<$online_node_usage>. > + > +For example, a service C<'vm:101'> being strictly colocated together > (positive) > +with two other services C<'vm:102'> and C<'vm:103'> and loosely colocated > +separate with another service C<'vm:104'> results in the hash map: > + > + { > + 'vm:102' => { > + affinity => 'together', > + strict => 1, > + node => 'node2' > + }, > + 'vm:103' => { > + affinity => 'together', > + strict => 1, > + node => 'node2' > + }, > + 'vm:104' => { > + affinity => 'separate', > + strict => 0, > + node => undef Why is the node undef here? > + } > + } > + > +=cut > + > +sub get_colocated_services { > + my ($rules, $sid, $online_node_usage) = @_; > + > + my $services = {}; > + > + PVE::HA::Rules::Colocation::foreach_colocation_rule($rules, sub { > + my ($rule) = @_; > + > + for my $csid (sort keys %{$rule->{services}}) { > + next if $csid eq $sid; > + > + $services->{$csid} = { > + node => $online_node_usage->get_service_node($csid), > + affinity => $rule->{affinity}, > + strict => $rule->{strict}, > + }; > + } > + }, { > + sid => $sid, > + }); > + > + return $services; > +} > + > +=head3 get_colocation_preference($rules, $sid, $online_node_usage) > + > +Returns a list of two hashes, where each is a hash map of the colocation > +preference of C<$sid>, according to the colocation rules in C<$rules> and the > +service locations in C<$online_node_usage>. > + > +The first hash is the positive colocation preference, where each element > +represents properties for how much C<$sid> prefers to be on the node. > +Currently, this is a binary C<$strict> field, which means either it should be s/it/the service/ > +there (C<0>) or must be there (C<1>). > + > +The second hash is the negative colocation preference, where each element > +represents properties for how much C<$sid> prefers not to be on the node. > +Currently, this is a binary C<$strict> field, which means either it should > not s/it/the service/ > +be there (C<0>) or must not be there (C<1>). > + > +=cut > + > +sub get_colocation_preference { > + my ($rules, $sid, $online_node_usage) = @_; > + > + my $services = get_colocated_services($rules, $sid, $online_node_usage); The name $services is a bit too generic, maybe $colocation_per_service or something? Maybe it would be better to just merge this one and the helper above into a single one? I.e. just handle the info while iterating the rules directly instead of creating a novel temporary per-service data-structure and iterate twice. > + > + my $together = {}; > + my $separate = {}; > + > + for my $service (values %$services) { > + my $node = $service->{node}; > + > + next if !$node; > + > + my $node_set = $service->{affinity} eq 'together' ? $together : > $separate; > + $node_set->{$node}->{strict} = $node_set->{$node}->{strict} || > $service->{strict}; > + } > + > + return ($together, $separate); > +} > + > +=head3 apply_positive_colocation_rules($together, $allowed_nodes) > + > +Applies the positive colocation preference C<$together> on the allowed node > +hash set C<$allowed_nodes> directly. > + > +Positive colocation means keeping services together on a single node, and > +therefore minimizing the separation of services. > + > +The allowed node hash set C<$allowed_nodes> is expected to contain any node, > +which is available to the service, i.e. each node is currently online, is > +available according to other location constraints, and the service has not > +failed running there yet. > + > +=cut > + > +sub apply_positive_colocation_rules { > + my ($together, $allowed_nodes) = @_; > + > + return if scalar(keys %$together) < 1; > + > + my $mandatory_nodes = {}; > + my $possible_nodes = PVE::HA::Tools::intersect($allowed_nodes, > $together); > + > + for my $node (sort keys %$together) { > + $mandatory_nodes->{$node} = 1 if $together->{$node}->{strict}; > + } > + > + if (scalar keys %$mandatory_nodes) { > + # limit to only the nodes the service must be on. > + for my $node (keys %$allowed_nodes) { > + next if exists($mandatory_nodes->{$node}); Style nit: I'd avoid using exists() if you explicitly expect a set value. Otherwise, it can break because of accidental auto-vivification in the future. > + > + delete $allowed_nodes->{$node}; > + } > + } elsif (scalar keys %$possible_nodes) { > + # limit to the possible nodes the service should be on, if there are > any. > + for my $node (keys %$allowed_nodes) { > + next if exists($possible_nodes->{$node}); > + > + delete $allowed_nodes->{$node}; This seems wrong. Non-strict rules should not limit the allowed nodes. See below for more on this. > + } > + } > +} > + > +=head3 apply_negative_colocation_rules($separate, $allowed_nodes) > + > +Applies the negative colocation preference C<$separate> on the allowed node > +hash set C<$allowed_nodes> directly. > + > +Negative colocation means keeping services separate on multiple nodes, and > +therefore maximizing the separation of services. > + > +The allowed node hash set C<$allowed_nodes> is expected to contain any node, > +which is available to the service, i.e. each node is currently online, is > +available according to other location constraints, and the service has not > +failed running there yet. > + > +=cut > + > +sub apply_negative_colocation_rules { > + my ($separate, $allowed_nodes) = @_; > + > + return if scalar(keys %$separate) < 1; > + > + my $mandatory_nodes = {}; > + my $possible_nodes = PVE::HA::Tools::set_difference($allowed_nodes, > $separate); > + > + for my $node (sort keys %$separate) { > + $mandatory_nodes->{$node} = 1 if $separate->{$node}->{strict}; > + } > + > + if (scalar keys %$mandatory_nodes) { > + # limit to the nodes the service must not be on. > + for my $node (keys %$allowed_nodes) { > + next if !exists($mandatory_nodes->{$node}); > + > + delete $allowed_nodes->{$node}; > + } > + } elsif (scalar keys %$possible_nodes) { > + # limit to the nodes the service should not be on, if any. > + for my $node (keys %$allowed_nodes) { > + next if exists($possible_nodes->{$node}); > + > + delete $allowed_nodes->{$node}; > + } > + } > +} > + > +sub apply_colocation_rules { > + my ($rules, $sid, $allowed_nodes, $online_node_usage) = @_; > + > + my ($together, $separate) = get_colocation_preference($rules, $sid, > $online_node_usage); > + > + apply_positive_colocation_rules($together, $allowed_nodes); > + apply_negative_colocation_rules($separate, $allowed_nodes); I think there could be a problematic scenario with * no strict positive rules, but loose strict positive rules * strict negative rules where apply_positive_colocation_rules() will limit $allowed_nodes in such a way that the strict negative rules cannot be satisfied anymore afterwards. I feel like what we actually want from non-strict rules is not to limit the allowed nodes at all, but only express preferences. After scoring, we could: 1. always take a colocation preference node if present no matter what the usage score is 2. have a threshold to not follow through, if there is a non-colocation preference node with a much better usage score relatively 3. somehow massage it into the score itself. E.g. every node that would be preferred by colocation gets a 0.5 multiplier score adjustment while other scores are unchanged - remember that lower score is better. 4. [insert your suggestion here] So to me it seems like there should be a helper that gives us: 1. list of nodes that satisfy strict rules - these we can then intersect with the $pri_nodes 2. list of nodes that are preferred by non-strict rules - these we can consider after scoring > +} > + > sub select_service_node { > - my ($groups, $online_node_usage, $sid, $service_conf, $current_node, > $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_; > + # TODO Cleanup this signature post-RFC > + my ($rules, $groups, $online_node_usage, $sid, $service_conf, > $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) > = @_; > > my $group = get_service_group($groups, $online_node_usage, > $service_conf); > > @@ -189,6 +382,8 @@ sub select_service_node { > > return $current_node if (!$try_next && !$best_scored) && > $pri_nodes->{$current_node}; > > + apply_colocation_rules($rules, $sid, $pri_nodes, $online_node_usage); > + > my $scores = $online_node_usage->score_nodes_to_start_service($sid, > $current_node); > my @nodes = sort { > $scores->{$a} <=> $scores->{$b} || $a cmp $b _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel