Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Stephen Cousins
Hi Sushil, Try changing NodeName specification to: NodeName=localhost CPUs=96 State=UNKNOWN Gres=gpu*:8* Also: TaskPlugin=task/cgroup Best, Steve On Wed, Apr 6, 2022 at 9:56 AM Sushil Mishra wrote: > Dear SLURM users, > > I am very new to alarm and need some help in configuring slurm in

Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Kamil Wilczek
Hello, try to comment out the line: AutoDetect=nvml And then restart "slurmd" and "slurmctld". Job allocations to the same GPU might be an effect of automatic MPS configuration, thogugh I'm not sure for 100%: https://slurm.schedmd.com/gres.html#MPS_Management Kind Regards -- Kamil Wilczek

[slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Sushil Mishra
Dear SLURM users, I am very new to alarm and need some help in configuring slurm in a single node machine. This machine has 8x Nvidia GPUs and 96 core cpu. Vendor has set up a "LocalQ" but thai somehow is running all the calculations in GPU 0. If I submit 4 independent jobs at a time, it starts ru