Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-22 Thread Kota Tsuyuzaki
> if I remember right, if you use cgroups, CUDA_VISIBLE_DEVICES always > starts from zero. So this is NOT the index of the GPU. Thanks. Just FYI, when I tested the environment variables with Slurm 19.05.2 + proctrack/cgroup configuration, It looks CUDA_VISIBLE_DEVICES fits the indices on the hos

[slurm-users] Installing GPU Features of Slurm 20

2020-06-22 Thread Petrillo, Neale A. (Contractor)
Hi! I'm trying to install Slurm 20.02 on my cluster with the GPU features. However, only my compute nodes have GPUs attached and so when I try to install the slurm-slurmctld RPM on my head node it fails saying it requires the NVIDIA control software. How do other folks work around this? Do you

Re: [slurm-users] [EXT] Set a per-cluster default limit of the number of active cores per user at a time

2020-06-22 Thread Paddy Doyle
Hi Sean, That sounds like a workable solution, thanks for the suggestion! I was hoping that there was something else I'd missed in the docs that let's you do it directly via sacctmgr without having to edit slurm.conf as well. Thanks again, Paddy On Sat, Jun 20, 2020 at 09:20:02AM +1000, Sean Cr

Re: [slurm-users] Installing GPU Features of Slurm 20

2020-06-22 Thread Grigory Shamov
It needs only the software (for Sliurm 19 it was NVML library) so I have copied over the said library on all my non-GPU nodes. Grigory Shamov University of Manitoba From: slurm-users on behalf of Petrillo, Neale A. (Contractor) Sent: Monday, June 22, 2020 10

Re: [slurm-users] How to exclude nodes in sbatch/srun?

2020-06-22 Thread Riebs, Andy
In fairness to our friends at SchedMD, this was filed as an enhancement request, not a bug. Since this is an open source project, there are 2 good ways to make it happen: 1. Fund someone, like SchedMD, to make the change. 2. Make the changes yourself, and submit the changes. Alter

Re: [slurm-users] [External] How to exclude nodes in sbatch/srun?

2020-06-22 Thread Paul Edmon
For the record we filed a bug on this years ago: https://bugs.schedmd.com/show_bug.cgi?id=3875  Hasn't been fixed yet though everyone seems to agree it is a good idea. Florian's suggestion is probably the best stopgap until this feature is implemented. -Paul Edmon- On 6/22/2020 7:11 AM, Flo

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-22 Thread Diego Zuccato
Il 16/06/20 16:23, Loris Bennett ha scritto: > Thanks for pointing this out - I hadn't been aware of this. Is there > anywhere in the documentation where this is explicitly stated? I don't remember. Seems Michael's experience is different. Possibly some other setting influences that behaviour. Ma

Re: [slurm-users] [External] How to exclude nodes in sbatch/srun?

2020-06-22 Thread Florian Zillner
Durai, To overcome this, we use noXXX features like below. Users can then request “8268&noGPU&EDR” to select nodes with 8268s on EDR without GPUs for example. # scontrol show node node5000 |grep AvailableFeatures AvailableFeatures=192GB,2933MHz,SD530,Platinum,8268,rack25,EDR,sb7890_0416,enc2

[slurm-users] How to exclude nodes in sbatch/srun?

2020-06-22 Thread Durai Arasan
Hi, The sbatch/srun commands have the "--constraint" option to select nodes with certain features. With this you can specify AND, OR, matching OR operators. But there is no NOT operator. How do you exclude nodes with a certain feature in the "--constraint" option? Or is there another option that c