Re: [slurm-users] gres:gpu managment

2019-05-23 Thread Daniel Vecerka
I have tested deviceQuery in the sbatch again and it works now:   Device PCI Domain ID / Bus ID / location ID:   0 / 97 / 0   Device PCI Domain ID / Bus ID / location ID:   0 / 137 / 0   Device PCI Domain ID / Bus ID / location ID:   0 / 98 / 0   Device PCI Domain ID / Bus ID / location ID:   0 /

Re: [slurm-users] gres:gpu managment

2019-05-23 Thread Daniel Vecerka
Jobs  ends on the same GPU. If I run CUDA deviceQuery in the sbatch I get: Device PCI Domain ID / Bus ID / location ID:   0 / 97 / 0 Device PCI Domain ID / Bus ID / location ID:   0 / 97 / 0 Device PCI Domain ID / Bus ID / location ID:   0 / 97 / 0 Device PCI Domain ID / Bus ID / location ID:   0

Re: [slurm-users] gres:gpu managment

2019-05-23 Thread Aaron Jackson
> Hello, > > we are running 18.08.6 and has problems with GRES GPU management. > There is "gpu" partition with 12 nodes each with 4 Tesla V100 cards. An > allocation of the GPUs is working, GPU management for sbatch/srun jobs > is working too - CUDA_VISIBLE_DEVICES is correctly set according >

[slurm-users] gres:gpu managment

2019-05-23 Thread Daniel Vecerka
Hello,  we are running 18.08.6 and has problems with GRES GPU management. There is "gpu" partition with 12 nodes each with 4 Tesla V100 cards.  An allocation of the GPUs is working, GPU management for sbatch/srun jobs is working too -  CUDA_VISIBLE_DEVICES is correctly set according --gres=gp