Re: [slurm-users] What happens if GPU GRES exceeding number of GPUs per node

2024-01-18 Thread Juergen Salk
Hi Wirawan, in general `--gres=gpu:6´ actually means six units of a generic resource named `gpu´ per node. Each unit may or may not be associated with a physical GPU device. I'd check the node configuration for the number of gres=gpu resource units that are configured for that node. scont

[slurm-users] What happens if GPU GRES exceeding number of GPUs per node

2024-01-17 Thread Purwanto, Wirawan
Hi, In my HPC center, I found a SLURM job that was submitted with --gres=gpu:6 whereas the cluster has only four GPUs per node each. It is a parallel job. Here are some relevant field printout: AllocCPUS 30 AllocGRES gpu:6 A