Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-05-01 Thread Nate Coraor
led puppet overnight, jobs running longer than 30 minutes are completing, and cgroups are persisting, whereas before that, they were not. --nate On Mon, Apr 30, 2018 at 5:47 PM, Andy Georges wrote: > > > > On 30 Apr 2018, at 22:37, Nate Coraor wrote: > > > > Hi Shawn, >

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Nate Coraor
n Mon, Apr 30, 2018 at 4:37 PM, Nate Coraor wrote: > Hi Shawn, > > I'm wondering if you're still seeing this. I've recently enabled > task/cgroup on 17.11.5 running on CentOS 7 and just discovered that jobs > are escaping their cgroups. For me this is res

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Nate Coraor
Hi Shawn, I'm wondering if you're still seeing this. I've recently enabled task/cgroup on 17.11.5 running on CentOS 7 and just discovered that jobs are escaping their cgroups. For me this is resulting in a lot of jobs ending in OUT_OF_MEMORY that shouldn't, because it appears slurmd thinks the oom