All,
I have slurmdbd running and everything is (mostly) happy. It's been working
well for months, but fairly regularly, when I do 'sacctmgr show runaway
jobs', I get:
*sacctmgr: error: Slurmctld running on cluster orion is not up, can't check
running jobs*
if I do 'sacctmgr show cluster', it lis
Brian, FWIW, we just restart slurmctld when this happens. I’ll be interested to
hear if there’s a proper fix.
Andy
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Brian Andrus
Sent: Thursday, July 18, 2019 11:01 AM
To: Slurm User Community List
Subject: [slurm-use
Dears,
we are using SLURM 18.08.6, we have 12 nodes with 4 x GPUs and 21
CPU-only nodes. We have 3 partitions:
gpu: only gpu nodes,
cpu: only cpu nodes
longjobs: all nodes.
Jobs in longjobs are with the lowest priority and can be preempted to
suspend. Our goal is to to allow using GP