Re: Move builds off of Hudson master
On Thu, Jul 2, 2009 at 18:36, Nigel Daley wrote: > Folks, > > I'd really like to move builds off the Hudson master. Here's a proposal: > > 1) We move the Hadoop related builds (Common, HDFS, Mapreduce, Pig, > ZooKeeper, Hive, HBase, Chukwa, Avro) off to some other machines (see 4 > below) > > 2) That would free up minerva and vesta as Ubuntu build slaves for all the > other projects (which should be more than enough capacity). > > 3) We get permission to use the current lucene.zones slave as a Solaris > build slave for those projects that really want a Solaris build (how many is > that I wonder?) > > 4) We add a bunch more Ubuntu slaves to hudson.zones out of a pool of > publicly IP'd yahoo.net machines my employer has for Hadoop related builds. So -- what's the situation with this proposal? I'm all in favour. I've been monitoring Hudson closely for the past 2 weeks, and it's clear that it's over-capacity. Even with the limiting band-aids I've been putting in place to control overlong builds, right now, the build queue has 8 pending builds waiting for a free executor, and that's been pretty much the normal situation. It needs more machines. Paul, are you still -1? --j. > Cheers, > Nige > > > On Jun 30, 2009, at 6:17 AM, Justin Mason wrote: > >> On Tue, Jun 30, 2009 at 13:46, sebb wrote: >>> >>> On 30/06/2009, Jukka Zitting wrote: Hi, Another Tuscany-2x build [1] was stuck with lots of OOM errors and other failures in the console log. I killed the build as it was taking already almost 7 hours, which is much more than the 40 minutes used by the last successful build. [1] http://hudson.zones.apache.org/hudson/job/Tuscany-2x/116/ >>> >>> It looked to me as though the build was stalled, i.e. Hudson was not >>> able to detect/recover from the situation. Is this a known problem? >>> >>> Is there any way to give the builds a bit more memory? >>> >>> It looks like Tuscany has not built successfully for a long while, so >>> this is likely to keep happening. >>> >>> It's a pity that the console output does not have time-stamps, or it >>> would be a lot easier to tell that nothing was happening. >> >> It could be the entire machine was under memory pressure, given those >> OOM errors. I wonder if that caused the Hudson master to get >> confused. >> >> --j. > > -- --j.
Re: Move builds off of Hudson master
FWIW, I'm still working on getting the yahoo.net machines properly imaged. Hoping to have them when I get back from vacation week of July 27. Nige On Jul 17, 2009, at 9:15 AM, Justin Mason wrote: On Thu, Jul 2, 2009 at 18:36, Nigel Daley wrote: Folks, I'd really like to move builds off the Hudson master. Here's a proposal: 1) We move the Hadoop related builds (Common, HDFS, Mapreduce, Pig, ZooKeeper, Hive, HBase, Chukwa, Avro) off to some other machines (see 4 below) 2) That would free up minerva and vesta as Ubuntu build slaves for all the other projects (which should be more than enough capacity). 3) We get permission to use the current lucene.zones slave as a Solaris build slave for those projects that really want a Solaris build (how many is that I wonder?) 4) We add a bunch more Ubuntu slaves to hudson.zones out of a pool of publicly IP'd yahoo.net machines my employer has for Hadoop related builds. So -- what's the situation with this proposal? I'm all in favour. I've been monitoring Hudson closely for the past 2 weeks, and it's clear that it's over-capacity. Even with the limiting band-aids I've been putting in place to control overlong builds, right now, the build queue has 8 pending builds waiting for a free executor, and that's been pretty much the normal situation. It needs more machines. Paul, are you still -1? --j. Cheers, Nige On Jun 30, 2009, at 6:17 AM, Justin Mason wrote: On Tue, Jun 30, 2009 at 13:46, sebb wrote: On 30/06/2009, Jukka Zitting wrote: Hi, Another Tuscany-2x build [1] was stuck with lots of OOM errors and other failures in the console log. I killed the build as it was taking already almost 7 hours, which is much more than the 40 minutes used by the last successful build. [1] http://hudson.zones.apache.org/hudson/job/Tuscany-2x/116/ It looked to me as though the build was stalled, i.e. Hudson was not able to detect/recover from the situation. Is this a known problem? Is there any way to give the builds a bit more memory? It looks like Tuscany has not built successfully for a long while, so this is likely to keep happening. It's a pity that the console output does not have time-stamps, or it would be a lot easier to tell that nothing was happening. It could be the entire machine was under memory pressure, given those OOM errors. I wonder if that caused the Hudson master to get confused. --j. -- --j.