We also noticed the same thing with 21.08.5. In the 21.08 series
SchedMD changed the way they handle cgroups to set the stage for cgroups
v2 (see: https://slurm.schedmd.com/SLUG21/Roadmap.pdf). The 21.08.5
introduced a bug fix which then caused mpirun to not pin properly
(particularly for olde
Hi Paul,
On 10/02/2022 14:33, Paul Brunk wrote:
Now we see a problem in which the OOM killer is in some cases
predictably killing job steps who don't seem to deserve it. In some
cases these are job scripts and input files which ran fine before our
Slurm upgrade. More details follow, but th
Tks a lot to both Steffen and Paul!
That clarifies everything!
Il 10/02/2022 14:11, Paul Brunk ha scritto:
Hi:
slurmctld runs as an unprivileged user ('slurm' by default) who probably
doesn't have read access to the user's job scripts. 'sbatch' submits
the scripts via network to slurmctld, w
Hello all:
We upgraded from 20.11.8 to 21.08.5 (CentOS 7.9, Slurm built without
pmix support) recently. After that, we found that in many cases,
'mpirun' was causing multi-node MPI jobs to have all MPI ranks within
a node run on the same core. We've moved on to 'srun'.
Now we see a problem in w
Hi:
slurmctld runs as an unprivileged user ('slurm' by default) who probably
doesn't have read access to the user's job scripts. 'sbatch' submits the
scripts via network to slurmctld, who stores them in the slurm.conf
'StateSaveLocation', and sends them to slurmds at dispatch time, who store t
On Thu, 2022-02-10 at 11:59:58 +0100, Diego Zuccato wrote:
> Hello all.
>
> Does slurmctld (or slurmdbd) need to access the same filesystems used on
> submit nodes? Or they just receive the needed information in the request?
>
> Does slurmctld need read access to /home/userA/myjob.sh or does it r
Hello all.
Does slurmctld (or slurmdbd) need to access the same filesystems used on
submit nodes? Or they just receive the needed information in the request?
Say the submit node and the worker nodes mount /home via NFS. Then userA
submits a job with
sbatch /home/userA/myjob.sh
Does slurmctl
Well, ‘sacctmgr modify cluster name=***’ is exactly what we want, and
inspired by this command, we found that ‘sacctmgr show cluster’ can
clearly list all the cluster associations.
But during test we found another problem. When limitation is defined both on
cluster level and user level, the sma