Folks, Over the last week or so we have received many reports of broken builds due to nodes out of resources. As noted in INFRA-19751, builds appear to fail yet continue to run, using up all available resources on a build node.
I will be implementing a system to kill jenkins processes based on duration of run. My initial feeling is to kill any single process which has been running for longer than one hour real-time. I will also be implementing a system to kill/purge all docker containers which have been running for over 6 hours. I am seeking input on these time limits, especially from those with larger builds. Is there any reason a -single process- or a docker container should run for more than 1 or 6 hours respectively? Thanks, Chris ASF Infra