> On Jan 22, 2020, at 4:55 PM, Chris Lambertus <c...@apache.org> wrote:
> 
> Folks,
> 
> Over the last week or so we have received many reports of broken builds due 
> to nodes out of resources. As noted in INFRA-19751, builds appear to fail yet 
> continue to run, using up all available resources on a build node.
> 
> I will be implementing a system to kill jenkins processes based on duration 
> of run. My initial feeling is to kill any single process which has been 
> running for longer than one hour real-time. 
> 
> I will also be implementing a system to kill/purge all docker containers 
> which have been running for over 6 hours. 

Additionally, orphaned docker jobs are causing major resource contention. I 
will be adding a weekly job to docker system prune —all && service docker 
restart.

-Chris



> 
> 
> I am seeking input on these time limits, especially from those with larger 
> builds. Is there any reason a -single process- or a docker container should 
> run for more than 1 or 6 hours respectively?
> 
> Thanks,
> Chris
> ASF Infra
> 

Reply via email to