On 26/03/18 12:43, Robbert Eggermont wrote:
Does this sound familiar to anyone?
Does the slurmd log report it trying to kill the auks process?
Also you might want to have a look at:
https://bugs.schedmd.com/show_bug.cgi?id=4733
to see if that bug fits what you're seeing. Basically I get a
Dear all,
We just upgraded from 17.02.10 to 17.11.5 (using auks and cgroups) and
we are hitting a nasty problem: finished jobs are hanging (indefinitely)
in the completing state.
On the node I see only two processes remaining: 'slurmstepd' and it's
child 'auks'. Looking at the slurmstepd wit
Hi everyone,
Is there a guide anywhere on how to figure out why jobs aren't being
started?
We have a cluster with nodes of mixed sizes/powers currently roughly half
the cluster is idle even though there are ~5k jobs queued.
All jobs are queued due to priority while only 1 job is marked as waiting