On Tue, Jul 7, 2009 at 07:23, Paul Querna<p...@querna.org> wrote: > I am mostly curious how adding more build slaves solves these > reliability problems, since they all seem to stem from builds taking > excessive amounts of time, freezing, or having whacky OOM issues.
After this mail, I spent a couple of days monitoring "hung" builds. There were a couple which were indeed frozen/OOMing tests. These have been resolved through use of the "build timeout" plugin, which is now set to time out long-running builds after 2 hours for those projects. I haven't observed any builds that the build timeout couldn't deal with, btw. However, the majority of backlogs were due to contention for the limited number of executors; particularly the 2 on the main instance. There are a few projects that perform 1.5-hour Maven deployments from this. While these were going on, it was routine to see ~15 other builds queueing up. So IMO, yep, we really do need to expand the executor pool. --j.