Hi Chris,
thanks for the detailed feedback. This is slurm 18.08.5, see also
https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/Dockerfile#L9
Best,
Peter
smime.p7s
Description: S/MIME Cryptographic Signature
On 4/15/19 8:15 AM, Peter Steinbach wrote:
We had a feeling that cgroups might be more optimal. Could you point us
to documentation that suggests cgroups to be a requirement?
Oh it's not a requirement, just that without it there's nothing to stop
a process using GPUs outside of its allocation
Hi Chris,
thanks for following up on this thread.
First of all, you will want to use cgroups to ensure that processes that do
not request GPUs cannot access them.
We had a feeling that cgroups might be more optimal. Could you point us
to documentation that suggests cgroups to be a requireme
On Monday, 25 March 2019 2:30:34 AM PDT Peter Steinbach wrote:
> I observed a weird behavior of the '--gres-flags=disable-binding'
> option. With the above .conf files, I created a local slurm cluster with
> 3 computes (2 GPUs and 4 cores each).
First of all, you will want to use cgroups to ensur
Same problem here: a Job submitted with gres-flags=disable-bindings is
assigned a node, but then the job step fails because all GPUs on that
node are already in use. Log messages:
[2019-04-05T15:29:05.216] error: gres/gpu: job 92453 node node5
overallocated resources by 1, (9 > 8)
[2019-04-05
Just to follow up, I filed a medium bug report with schedmd on this:
https://bugs.schedmd.com/show_bug.cgi?id=6763
Best,
Peter
On 3/25/19 10:30 AM, Peter Steinbach wrote:
Dear all,
Using these config files,
https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb
Dear all,
Using these config files,
https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/gres.conf
https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/slurm.conf
I observed a weird behavior of the '--gres-flags=