On Thu, Jul 2, 2009 at 18:36, Nigel Daley<ni...@apache.org> wrote: > Folks, > > I'd really like to move builds off the Hudson master. Here's a proposal: > > 1) We move the Hadoop related builds (Common, HDFS, Mapreduce, Pig, > ZooKeeper, Hive, HBase, Chukwa, Avro) off to some other machines (see 4 > below) > > 2) That would free up minerva and vesta as Ubuntu build slaves for all the > other projects (which should be more than enough capacity). > > 3) We get permission to use the current lucene.zones slave as a Solaris > build slave for those projects that really want a Solaris build (how many is > that I wonder?) > > 4) We add a bunch more Ubuntu slaves to hudson.zones out of a pool of > publicly IP'd yahoo.net machines my employer has for Hadoop related builds.
So -- what's the situation with this proposal? I'm all in favour. I've been monitoring Hudson closely for the past 2 weeks, and it's clear that it's over-capacity. Even with the limiting band-aids I've been putting in place to control overlong builds, right now, the build queue has 8 pending builds waiting for a free executor, and that's been pretty much the normal situation. It needs more machines. Paul, are you still -1? --j. > Cheers, > Nige > > > On Jun 30, 2009, at 6:17 AM, Justin Mason wrote: > >> On Tue, Jun 30, 2009 at 13:46, sebb<seb...@gmail.com> wrote: >>> >>> On 30/06/2009, Jukka Zitting <jukka.zitt...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> Another Tuscany-2x build [1] was stuck with lots of OOM errors and >>>> other failures in the console log. I killed the build as it was taking >>>> already almost 7 hours, which is much more than the 40 minutes used by >>>> the last successful build. >>>> >>>> [1] http://hudson.zones.apache.org/hudson/job/Tuscany-2x/116/ >>> >>> It looked to me as though the build was stalled, i.e. Hudson was not >>> able to detect/recover from the situation. Is this a known problem? >>> >>> Is there any way to give the builds a bit more memory? >>> >>> It looks like Tuscany has not built successfully for a long while, so >>> this is likely to keep happening. >>> >>> It's a pity that the console output does not have time-stamps, or it >>> would be a lot easier to tell that nothing was happening. >> >> It could be the entire machine was under memory pressure, given those >> OOM errors. I wonder if that caused the Hudson master to get >> confused. >> >> --j. > > -- --j.