> On Jan 22, 2020, at 4:55 PM, Chris Lambertus <c...@apache.org> wrote:
>
> Folks,
>
> Over the last week or so we have received many reports of broken builds due
> to nodes out of resources. As noted in INFRA-19751, builds appear to fail yet
> continue to run, using up all available resources on a build node.
>
> I will be implementing a system to kill jenkins processes based on duration
> of run. My initial feeling is to kill any single process which has been
> running for longer than one hour real-time.
>
> I will also be implementing a system to kill/purge all docker containers
> which have been running for over 6 hours.
Additionally, orphaned docker jobs are causing major resource contention. I
will be adding a weekly job to docker system prune —all && service docker
restart.
-Chris
>
>
> I am seeking input on these time limits, especially from those with larger
> builds. Is there any reason a -single process- or a docker container should
> run for more than 1 or 6 hours respectively?
>
> Thanks,
> Chris
> ASF Infra
>