OS: CentOS 8.5
Slurm: 22.05
Recently upgraded to 22.05. Upgrade was successful, but after a while I started
to see the following messages in the slurmdbd.log file:
error: We have more time than is possible (9344745+7524000+0)(16868745) >
12362400 for cluster CLUSTERNAME(3434) from 2024-09-18T13
I don’t think you should expect this from overlapping nodes in partitions, but
instead whe you’re allowing hardware itself to be oversubscribed.
Was your upgrade in this window?
I would suggest looking for runaway jobs, which you’ve done, and am not sure
what else.
--
#BlackLivesMatter
||
The upgrade was a couple of hours prior to the messages appearing in the logs.
SS
From: Ryan Novosielski
Sent: Thursday, September 19, 2024 12:08:42 AM
To: Sajesh Singh
Cc: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] SlurmDBD errors
EXTERNAL SENDER
Hello,
is it possible to change a pending job from --exclusive to
--exclusive=user? I tried scontrol update jobid=... oversubscribe=user,
but it seems to only accept yes or no.
Gerhard
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...