Re: Move builds off of Hudson master

2009-07-17 Thread Justin Mason
On Thu, Jul 2, 2009 at 18:36, Nigel Daley wrote:
> Folks,
>
> I'd really like to move builds off the Hudson master.  Here's a proposal:
>
> 1) We move the Hadoop related builds (Common, HDFS, Mapreduce, Pig,
> ZooKeeper, Hive, HBase, Chukwa, Avro) off to some other machines (see 4
> below)
>
> 2) That would free up minerva and vesta as Ubuntu build slaves for all the
> other projects (which should be more than enough capacity).
>
> 3) We get permission to use the current lucene.zones slave as a Solaris
> build slave for those projects that really want a Solaris build (how many is
> that I wonder?)
>
> 4) We add a bunch more Ubuntu slaves to hudson.zones out of a pool of
> publicly IP'd yahoo.net machines my employer has for Hadoop related builds.

So -- what's the situation with this proposal?

I'm all in favour.  I've been monitoring Hudson closely for the past 2
weeks, and it's clear that it's over-capacity. Even with the limiting
band-aids I've been putting in place to control overlong builds, right
now, the build queue has 8 pending builds waiting for a free executor,
and that's been pretty much the normal situation.  It needs more
machines.

Paul, are you still -1?

--j.


> Cheers,
> Nige
>
>
> On Jun 30, 2009, at 6:17 AM, Justin Mason wrote:
>
>> On Tue, Jun 30, 2009 at 13:46, sebb wrote:
>>>
>>> On 30/06/2009, Jukka Zitting  wrote:

 Hi,

  Another Tuscany-2x build [1] was stuck with lots of OOM errors and
  other failures in the console log. I killed the build as it was taking
  already almost 7 hours, which is much more than the 40 minutes used by
  the last successful build.

  [1] http://hudson.zones.apache.org/hudson/job/Tuscany-2x/116/
>>>
>>> It looked to me as though the build was stalled, i.e. Hudson was not
>>> able to detect/recover from the situation. Is this a known problem?
>>>
>>> Is there any way to give the builds a bit more memory?
>>>
>>> It looks like Tuscany has not built successfully for a long while, so
>>> this is likely to keep happening.
>>>
>>> It's a pity that the console output does not have time-stamps, or it
>>> would be a lot easier to tell that nothing was happening.
>>
>> It could be the entire machine was under memory pressure, given those
>> OOM errors.  I wonder if that caused the Hudson master to get
>> confused.
>>
>> --j.
>
>



-- 
--j.


Re: Move builds off of Hudson master

2009-07-17 Thread Nigel Daley
FWIW, I'm still working on getting the yahoo.net machines properly  
imaged.  Hoping to have them when I get back from vacation week of  
July 27.


Nige

On Jul 17, 2009, at 9:15 AM, Justin Mason wrote:


On Thu, Jul 2, 2009 at 18:36, Nigel Daley wrote:

Folks,

I'd really like to move builds off the Hudson master.  Here's a  
proposal:


1) We move the Hadoop related builds (Common, HDFS, Mapreduce, Pig,
ZooKeeper, Hive, HBase, Chukwa, Avro) off to some other machines  
(see 4

below)

2) That would free up minerva and vesta as Ubuntu build slaves for  
all the

other projects (which should be more than enough capacity).

3) We get permission to use the current lucene.zones slave as a  
Solaris
build slave for those projects that really want a Solaris build  
(how many is

that I wonder?)

4) We add a bunch more Ubuntu slaves to hudson.zones out of a pool of
publicly IP'd yahoo.net machines my employer has for Hadoop related  
builds.


So -- what's the situation with this proposal?

I'm all in favour.  I've been monitoring Hudson closely for the past 2
weeks, and it's clear that it's over-capacity. Even with the limiting
band-aids I've been putting in place to control overlong builds, right
now, the build queue has 8 pending builds waiting for a free executor,
and that's been pretty much the normal situation.  It needs more
machines.

Paul, are you still -1?

--j.



Cheers,
Nige


On Jun 30, 2009, at 6:17 AM, Justin Mason wrote:


On Tue, Jun 30, 2009 at 13:46, sebb wrote:


On 30/06/2009, Jukka Zitting  wrote:


Hi,

 Another Tuscany-2x build [1] was stuck with lots of OOM errors  
and
 other failures in the console log. I killed the build as it was  
taking
 already almost 7 hours, which is much more than the 40 minutes  
used by

 the last successful build.

 [1] http://hudson.zones.apache.org/hudson/job/Tuscany-2x/116/


It looked to me as though the build was stalled, i.e. Hudson was  
not

able to detect/recover from the situation. Is this a known problem?

Is there any way to give the builds a bit more memory?

It looks like Tuscany has not built successfully for a long  
while, so

this is likely to keep happening.

It's a pity that the console output does not have time-stamps, or  
it

would be a lot easier to tell that nothing was happening.


It could be the entire machine was under memory pressure, given  
those

OOM errors.  I wonder if that caused the Hudson master to get
confused.

--j.







--
--j.