Re: [slurm-users] slurm power save question

2023-11-22 Thread Brian Andrus
As I understand it, that setting means "Always have at least X nodes up", which includes running jobs. So it stops any wait time for the first X jobs being submitted, but any jobs after that will need to wait for the power_up sequence. Brian Andrus On 11/22/2023 6:58 AM, Davide DelVento wrote

Re: [slurm-users] partition qos without managing users

2023-11-22 Thread Brian Andrus
Eg, Could you be more specific as to what you want? Is there a specific user you want to control, or no user should get more than x cpus in the partition? Or no single job should get more than x cpus? The details matter to determine the right approach and right settings. Brian Andrus On 11/21

Re: [slurm-users] Dynamic MIG Question

2023-11-22 Thread Davide DelVento
I assume you mean the sentence about dynamic MIG at https://slurm.schedmd.com/gres.html#MIG_Management Could it be supported? I think so, but only if one of their paying customers (that could be you) asks for it. On Wed, Nov 22, 2023 at 11:24 AM Aaron Kollmann < aaron.kollm...@student.hpi.de> wrot

[slurm-users] Dynamic MIG Question

2023-11-22 Thread Aaron Kollmann
Hello All, I am currently working in a research project and we are trying to find out whether we can use NVIDIAs multi-instance GPU (MIG) dynamically in SLURM. For instance: - a user requests a job and wants a GPU but none is available - now SLURM will reconfigure a MIG GPU to create a part

[slurm-users] slurm power save question

2023-11-22 Thread Davide DelVento
I've started playing with powersave and have a question about SuspendExcNodes. The documentation at https://slurm.schedmd.com/power_save.html says For example nid[10-20]:4 will prevent 4 usable nodes (i.e IDLE and not DOWN, DRAINING or already powered down) in the set nid[10-20] from being powered