On 26/03/18 20:50, Robbert Eggermont wrote:
The suggest fix (use sigkill instead of sigterm in slurm_spank_auks
to stop auks) seems to work (so far).
Excellent, so glad to hear that!
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
FYI:
I think we've run into this issue:
https://github.com/hautreux/auks/issues/24
It seems to be triggered by a change in signal blocking in slurmstepd:
https://github.com/SchedMD/slurm/commit/d2c83807097605f10f0b19cf2c5cb5c2c6f35ad6
The suggest fix (use sigkill instead of sigterm in slurm_s
Hi Chris,
On 26-03-18 05:04, Christopher Samuel wrote:
Does the slurmd log report it trying to kill the auks process?
The first thing I need to do is turn up the logging verbosity.
https://bugs.schedmd.com/show_bug.cgi?id=4733
The fact that auks is hanging around makes me wonder if this i
On 26/03/18 12:43, Robbert Eggermont wrote:
Does this sound familiar to anyone?
Does the slurmd log report it trying to kill the auks process?
Also you might want to have a look at:
https://bugs.schedmd.com/show_bug.cgi?id=4733
to see if that bug fits what you're seeing. Basically I get a
Dear all,
We just upgraded from 17.02.10 to 17.11.5 (using auks and cgroups) and
we are hitting a nasty problem: finished jobs are hanging (indefinitely)
in the completing state.
On the node I see only two processes remaining: 'slurmstepd' and it's
child 'auks'. Looking at the slurmstepd wit