Applying patches d52d8f4f0 and f07f53fc13 to a slurm 17.11.7 source tree
fixes this issue in my experience. Only requires restarting slurmctld.
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
Acting Group Lead, Computational Systems Group
National Energy Research Scientific Computing C
Thank you, guys,
Lets wait for 17.11.8. Any estimation for the release date?
Best regards,
Taras
On Wed, Jul 11, 2018 at 12:11 AM Kilian Cavalotti <
kilian.cavalotti.w...@gmail.com> wrote:
> On Tue, Jul 10, 2018 at 10:34 AM, Taras Shapovalov
> wrote:
> > I noticed the commit that can be relat
On Tue, Jul 10, 2018 at 10:34 AM, Taras Shapovalov
wrote:
> I noticed the commit that can be related to this:
>
> https://github.com/SchedMD/slurm/commit/bf4cb0b1b01f3e165bf12e69fe59aa7b222f8d8e
Yes. See also this bug: https://bugs.schedmd.com/show_bug.cgi?id=5240
This commit will be reverted in
What is the change in the commit you're thinking about?
Original message From: Taras Shapovalov
Date: 10/07/2018 19:34 (GMT+01:00) To:
slurm-us...@schedmd.com Subject: [slurm-users] DefMemPerCPU is reset to 1 after
upgrade
Hey guys,
When we upgraded to 17.11.7, th
rm-us...@schedmd.com"
Subject: [slurm-users] DefMemPerCPU is reset to 1 after upgrade
Hey guys,
When we upgraded to 17.11.7, then on some clusters all jobs are killed with
these messages:
slurmstepd: error: Job 374 exceeded memory limit (1308 > 1024), being killed
slurmstepd: error: Exceeded job
Hey guys,
When we upgraded to 17.11.7, then on some clusters all jobs are killed with
these messages:
slurmstepd: error: Job 374 exceeded memory limit (1308 > 1024), being
killed
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: *** JOB 374 ON node002 CANCELLED AT
2018-06-28T0