Re: [slurm-users] inconsistent CUDA_VISIBLE_DEVICES with srun vs sbatch

2021-05-20 Thread Christopher Samuel
On 5/19/21 1:41 pm, Tim Carlson wrote: but I still don't understand how with "shared=exclusive" srun gives one result and sbatch gives another. I can't either, but I can reproduce it with Slurm 20.11.7. :-/ -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] inconsistent CUDA_VISIBLE_DEVICES with srun vs sbatch

2021-05-19 Thread Tim Carlson
As a follow-up, we did figure out that if we set the partition to not be exclusive we get something that seems more reasonable. That is to say that if I use a partition like this PartitionName=dlt_shared Nodes=dlt[01-12] Default=NO Shared=YES MaxTime=4-00:00:00 State=UP DefaultTime=8:00:00 wit

[slurm-users] inconsistent CUDA_VISIBLE_DEVICES with srun vs sbatch

2021-05-19 Thread Tim Carlson
Hey folks, Here is my setup: slurm-20.11.4 on x86_64 running Centos 7.x with CUDA 11.1 The relevant parts of the slurm.conf and a particular gres.conf file are: SelectType=select/cons_res SelectTypeParameters=CR_Core PriorityType=priority/multifactor GresTypes=gpu NodeName=dlt[01-12] Gr