Hello, I have a question related to the cloud feature or a feature that can solve an issue that I have with my cluster,to make it simple let say that I have a set of nodes ( let say 10 nodes ), if needed I move node/s from cluster A to cluster B and in my slurm.conf I define all the possible number of available nodes:
Cluster A NodeName=clusterA-[001-010] Cluster B NodeName=clusterB-[001-010] In normal operation I have 5 nodes in 'cluster A' and 5 in 'cluster B', but in case of needs I reboot a node of 'cluster B' in 'cluster A', and the result will be 4 nodes in 'cluster B' and 6 in 'cluster A'. The "issue" is that since I specified all possible nodes in slurm.conf, when I ran sinfo what I see is: Cluster A Normal up 1-00:00:00 5 up clusterA-[01-05] Normal up 1-00:00:00 5 down* clusterA-[06-10] Cluster B Normal up 1-00:00:00 5 up clusterB-[06-10] Normal up 1-00:00:00 5 down* clusterB-[01-5] And in both slurmctld.log I have the message: error: Unable to resolve "clusterA-006": Unknown host or error: Unable to resolve "clusterB-001": Unknown host Since I have a lot of partitions and a lot of nodes, the sinfo it is much more complicated to read due to the DOWN nodes that are actually not present in the system, is there a way/feature/option that wont display in the sinfo nodes that are actually NOT present and reachable by the slurmctld due to the "error: Unable to resolve "clusterA-006": Unknown host " ? Basically I'd like to have in both slurm.conf all the possible nodes but the sinfo should shows: Cluster A Normal up 1-00:00:00 5 up clusterA-[01-05] Cluster B Normal up 1-00:00:00 5 up clusterB-[06-10] And If I move a node once the node is actually reachable: Cluster A Normal up 1-00:00:00 6 up clusterA-[01-06] Cluster B Normal up 1-00:00:00 4 up clusterB-[07-10] Thanks Fabio -- - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre via Trevano 131 - 6900 Lugano, Switzerland Tel: +41 (0)91 610 82 04