Hola,

Slurm is complicated software, and sometimes the docs can be dense - I'm
looking for some clarification please.

We have a system set up with Threads as CPUs. 1 socket, 4 cores, 2 threads
= 8 cpus

I would like to implement CGroups because some of our users are quite happy
to utilise all threads despite other users.

We have TaskPlugin=task/cgroup and when testing I noticed that the # of
threads/cpus being allocated was rounded up to the nearest even. I presume
this was due to cgroups marking a core as a cpu, rather than a thread as a
cpu.

So I set TaskPluginParam=Threads, but slurm is still allowing the use of
more threads than have been requested.

In particular, I'm running this test:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=3

stress-ng --cpu 5 --cpu-method all --io 5 --vm 1 --vm-bytes 1G --timeout
600s --quiet


I was hoping that the cgroup would kill the job because of too many cpus,
but that's not how stress-ng works I've discovered.

Regardless, when running this, I noted that squeue shows I've been
allocated 3 CPUs, but on the server itself, I'm seeing four cpus being used?

What have I done wrong? Is it possible to have granular control at the
thread level with cgroups?

cheers
L.


------
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here — and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "

*Greg Bloom* @greggish
https://twitter.com/greggish/status/873177525903609857

Reply via email to