Dear slurm-users,
I am currently looking into options how I can deactivate suspending for
nodes. I am both interested in the general case:
Allowing all nodes to be powered up, but for all nodes without automatic
suspending except when triggering power down manually.
And the special case:
Allow
Xaver,
Your descriptions of cases is a bit difficult to understand. It seems
you want to have exceptions for power_up. That could be done by
filtering the list of nodes yourself with any script/method you like and
then do power_up on the remaining list.
For excluding nodes from being suspend
Hello -
I'm trying to get gpu container jobs working on virtual nodes. The jobs fail
with "Test CUDA failure common.cu:893 'invalid device ordinal'" in the output
file and "slurmstepd: error: mpi/pmix_v3: _errhandler: n4 [0]:
pmixp_client_v2.c:211: Error handler invoked: status = -25, source =