As I understand it, that setting means "Always have at least X nodes
up", which includes running jobs. So it stops any wait time for the
first X jobs being submitted, but any jobs after that will need to wait
for the power_up sequence.
Brian Andrus
On 11/22/2023 6:58 AM, Davide DelVento wrote
Eg,
Could you be more specific as to what you want?
Is there a specific user you want to control, or no user should get more
than x cpus in the partition? Or no single job should get more than x cpus?
The details matter to determine the right approach and right settings.
Brian Andrus
On 11/21
I assume you mean the sentence about dynamic MIG at
https://slurm.schedmd.com/gres.html#MIG_Management
Could it be supported? I think so, but only if one of their paying
customers (that could be you) asks for it.
On Wed, Nov 22, 2023 at 11:24 AM Aaron Kollmann <
aaron.kollm...@student.hpi.de> wrot
Hello All,
I am currently working in a research project and we are trying to find
out whether we can use NVIDIAs multi-instance GPU (MIG) dynamically in
SLURM.
For instance:
- a user requests a job and wants a GPU but none is available
- now SLURM will reconfigure a MIG GPU to create a part
I've started playing with powersave and have a question about
SuspendExcNodes. The documentation at
https://slurm.schedmd.com/power_save.html says
For example nid[10-20]:4 will prevent 4 usable nodes (i.e IDLE and not
DOWN, DRAINING or already powered down) in the set nid[10-20] from being
powered