Hi Wirawan,
in general `--gres=gpu:6´ actually means six units of a generic resource named
`gpu´
per node. Each unit may or may not be associated with a physical GPU device.
I'd check the node configuration for the number of gres=gpu resource units that
are
configured for that node.
scont
Hi,
In my HPC center, I found a SLURM job that was submitted with --gres=gpu:6
whereas the cluster has only four GPUs per node each. It is a parallel job.
Here are some relevant field printout:
AllocCPUS 30
AllocGRES gpu:6
A