On 4/15/19 8:15 AM, Peter Steinbach wrote:
We had a feeling that cgroups might be more optimal. Could you point us to documentation that suggests cgroups to be a requirement?
Oh it's not a requirement, just that without it there's nothing to stop a process using GPUs outside of its allocation other than hoping the user doesn't override the environment variables set and the code honours them.
No HT involved here at any point, neither on our cluster nor within the dockerized slurm installation I was playing with.
OK, that's weird. One thing I noticed looking at your bug report is the node reports: AllocTRES=cpu=1,mem=500M no mentions of GPU's being allocated. Despite it also saying: Gres=gpu:titanxp:2 and your jobs saying: GRES_IDX=gpu(IDX:0-1) and GRES_IDX=gpu(IDX:) That second one is extra odd, because there's no index there. What's the Slurm version you're on? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA