On 2/26/21 8:44 AM, Baldauf, Sebastian Martin wrote:
I just want to ask if someone has an idea how to give a GPU and some
CPUs of a node to one account exclusively but keep the remaining CPUs of
this node available for all users.
For me it looks like that using partitions is only working for whol
Thank you! I’ll see if this is an option … would be nice.
I’ll see if we can try this.
Best wishes
Volker
> On Feb 25, 2021, at 11:07 PM, Angelos Ching
> wrote:
>
> I think it's related to the job step launch semantic change introduced at
> 20.11.0, which has been reverted since 20.11.3, see
We saw something that sounds similar to this. See this bug report:
https://bugs.schedmd.com/show_bug.cgi?id=10196
SchedMD never found the root cause. They thought it might have something to do
with a timing problem on Prolog scripts, but the thing that fixed it for us was
to set GraceTime=0 on
We recently upgraded from Slurm 19.05.8 to 20.11.3. In our
configuration, we have an interruptible partition named 'interruptible'
for long-running, low-priority jobs that use checkpoint/restart. Jobs
that are preempted would be killed and requeued rather than suspended.
This configuration has