[slurm-users] Re: when running `salloc --gres=gpu:1` should I see all gpus in nvidia-smi ?

2024-08-05 Thread Oren via slurm-users
Hi James, I am sort of the admin and trying to understand what the goal should be. Thanks Roberto, I'll have a look on ConstrainDevices On Mon, 5 Aug 2024 at 18:14, Roberto Polverelli Monti via slurm-users < slurm-users@lists.schedm

[slurm-users] Re: when running `salloc --gres=gpu:1` should I see all gpus in nvidia-smi ?

2024-08-05 Thread Roberto Polverelli Monti via slurm-users
Hello Oren, On 8/5/24 3:20 PM, Oren via slurm-users wrote: When I am running this command: `salloc --nodelist=gpu03 -p A4500_Features  --gres=gpu:1` and then automatically ssh to the job, what should I see when I run nvidia-smi? All the GPUs in the host or just a single one? That should depen

[slurm-users] ODP: Re: _refresh_assoc_mgr_qos_list: no new list given back keeping cached one

2024-08-05 Thread Rafał Lalik via slurm-users
I had the same issue. After upgrading to slurm-24.05.2 problem is solved. Try it. R. Od: andreas.wiedholz--- via slurm-users Wysłane: poniedziałek, 15 lipca 2024 14:32 Do: slurm-users@lists.schedmd.com Temat: [slurm-users] Re: _refresh_assoc_mgr_qos_list: no new

[slurm-users] when running `salloc --gres=gpu:1` should I see all gpus in nvidia-smi ?

2024-08-05 Thread Oren via slurm-users
Hello, When I am running this command: `salloc --nodelist=gpu03 -p A4500_Features --gres=gpu:1` and then automatically ssh to the job, what should I see when I run nvidia-smi? All the GPUs in the host or just a single one? Thanks -- slurm-users mailing list -- slurm-users@lists.schedmd.com To un

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-05 Thread Daniel Letai via slurm-users
I think the issue is more severe than you describe. Slurm juggles the needs of many jobs. Just because there are some resources available at the exact second a job starts, doesn't mean those resource are not pre-allocated for some future job waiting for e

[slurm-users] Re: problem with squeue --json with version 24.05.1

2024-08-05 Thread Markus Köberl via slurm-users
For me the problem is now fixed with SLURM 24.05.2 regards Markus Köberl On Wednesday, 3 July 2024 15:34:37 CEST Ümit Seren wrote: > We experience the same issue. > > SLURM 24.05.1 segfaults with squeue –json and squeue --json=v0.0.41 but > works with squeue --json=v0.0.40 > > > From: Markus