I agree with Chris's opinion. I could find out the reason. As Chris said, the problem is cgroup. when I request a job to slurm that using 1 gres:gpu, slurm assign the job to the node who can have enough resource. when slurm assign a job to the node, slurm gives resource information to node after make a cgroup environment. But, the problem is that Docker uses their own cgroup config. That's why I could get right information through slurm-side not Docker-side. Here is my workaround code for get right information in the Docker-side.
scontrol show job=$SLURM_JOBID --details | grep GRES_IDX | awk -F "IDX:" '{print $2}' | awk -F ")" '{print $1}' scontrol show with --details option can get GRES_IDX. So, I've used this information in my application. Please refer to this command if someone is suffering this. -----Original Message----- From: "Chris Samuel"<ch...@csamuel.org> To: <slurm-users@lists.schedmd.com>; Cc: Sent: 2019-01-07 (월) 11:59:09 Subject: Re: [slurm-users] gres with docker problem On 4/1/19 5:48 am, Marcin Stolarek wrote: > I think that the main reason is the lack of access to some /dev "files" > in your docker container. For singularity nvidia plugin is required, > maybe there is something similar for docker... That's unlikely, the problem isn't that nvidia-smi isn't working in Docker because of a lack of device files, the problem is that it's seeing all 4 GPUs and thus is no longer being controlled by the device cgroup that Slurm is creating. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC