Re: [slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

2018-03-26 Thread Robbert Eggermont
rm in slurm_spank_auks to stop auks) seems to work (so far). -- Robbert Eggermont Intelligent Systems Support & Data Steward | TU Delft +31 15 27 83234 | Building 28, Floor 5, Room W660 Available Mon, Wed-Fri

Re: [slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

2018-03-26 Thread Robbert Eggermont
different than last night, when the nodes were drained because of a batch job failure... I'll report back when I find out more. Robbert -- Robbert Eggermont Intelligent Systems Support & Data Steward | TU Delft +31 15 27 83234 | Building 28, Floor 5, Room W660 Available Mon, Wed-Fri

[slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

2018-03-25 Thread Robbert Eggermont
Dear all, We just upgraded from 17.02.10 to 17.11.5 (using auks and cgroups) and we are hitting a nasty problem: finished jobs are hanging (indefinitely) in the completing state. On the node I see only two processes remaining: 'slurmstepd' and it's child 'auks'. Looking at the slurmstepd wit