rm in slurm_spank_auks to
stop auks) seems to work (so far).
--
Robbert Eggermont
Intelligent Systems Support & Data Steward | TU Delft
+31 15 27 83234 | Building 28, Floor 5, Room W660
Available Mon, Wed-Fri
different than last night, when the nodes were drained because
of a batch job failure...
I'll report back when I find out more.
Robbert
--
Robbert Eggermont
Intelligent Systems Support & Data Steward | TU Delft
+31 15 27 83234 | Building 28, Floor 5, Room W660
Available Mon, Wed-Fri
Dear all,
We just upgraded from 17.02.10 to 17.11.5 (using auks and cgroups) and
we are hitting a nasty problem: finished jobs are hanging (indefinitely)
in the completing state.
On the node I see only two processes remaining: 'slurmstepd' and it's
child 'auks'. Looking at the slurmstepd wit