Re: Builds that have been failing for a while

2010-09-24 Thread Niklas Gustavsson
Be warned, I'll run the script to disable build which been failing for
more than 31 days on Sunday. This is the current list of such jobs:

ActiveMQ-SysTest-5.3   | 6 mo 25 days
AsyncWeb   | 3 mo 2 days
Cayenne-doc| 1 mo 21 days
clerezza-site  | 3 mo 21 days
Empire-DB multios  | 1 mo 23 days
Felix-FileInstall  | 1 mo 21 days
Felix-Gogo | 2 mo 11 days
Felix-WebConsole   | 1 mo 23 days
Hadoop-20-Build| 2 yr 3 mo
Hadoop-Hdfs-21-Build   | 5 mo 5 days
Hadoop-Hdfs-trunk  | 5 mo 20 days
Hadoop-Mapreduce-21-Build  | 3 mo 25 days
Hadoop-Mapreduce-trunk | 3 mo 25 days
Hadoop-Mapreduce-trunk-Commit  | 5 mo 20 days
Hadoop-Patch-h1.grid.sp2.yahoo.net | 3 mo 21 days
Hadoop-Patch-h4.grid.sp2.yahoo.net | 3 mo 16 days
Hadoop-Patch-h9.grid.sp2.yahoo.net | 8 mo 12 days
Hama-Patch | 2 mo 28 days
Hama-Patch-Admin   | 1 mo 10 days
Hdfs-Patch-h2.grid.sp2.yahoo.net   | 5 mo 21 days
Hdfs-Patch-h5.grid.sp2.yahoo.net   | 5 mo 21 days
Hive-trunk-h0.18   | 1 mo 10 days
Hive-trunk-h0.19   | 4 mo 20 days
Jackrabbit-1.6 | 1 mo 12 days
Jackrabbit-classloader | 3 mo 10 days
Jackrabbit-ocm | 3 mo 10 days
jspf-trunk | 1 mo 10 days
Mahout-Patch-Admin | 1 yr 11 mo
mailet-standard-trunk  | 2 mo 4 days
Mapreduce-Patch-h3.grid.sp2.yahoo.net  | 4 mo 27 days
Mapreduce-Patch-h4.grid.sp2.yahoo.net  | 3 mo 29 days
Mapreduce-Patch-h6.grid.sp2.yahoo.net  | 4 mo 16 days
Mapreduce-Patch-h9.grid.sp2.yahoo.net  | 6 mo 23 days
Nutch-trunk| 2 mo 19 days
org.apache.kato.eclipse| 1 yr 2 mo
Pig-Patch-h7.grid.sp2.yahoo.net| 3 mo 17 days
Pig-Patch-h8.grid.sp2.yahoo.net| 5 mo 0 days
ServiceMix-Plugins | 2 mo 12 days
ServiceMix-Utils   | 2 mo 10 days
ServiceMix3| 1 mo 3 days
Shiro  | 1 mo 13 days
struts-annotations | 1 yr 1 mo
tapestry-5.0-freestyle | 6 mo 11 days
TestBuilds | 1 yr 0 mo
Turbine Fulcrum| 3 mo 21 days
Tuscany-1x | 9 mo 6 days
Tuscany-run-plugin | 3 mo 28 days
Zookeeper-Patch-h1.grid.sp2.yahoo.net  | 2 mo 6 days
Zookeeper-Patch-h7.grid.sp2.yahoo.net  | 1 mo 20 days

/niklas


Re: Builds that have been failing for a while

2010-09-24 Thread Jukka Zitting
Hi,

On Fri, Sep 24, 2010 at 10:37 AM, Niklas Gustavsson
 wrote:
> Be warned, I'll run the script to disable build which been failing for
> more than 31 days on Sunday. This is the current list of such jobs:
> [...]
> Jackrabbit-1.6                                     | 1 mo 12 days
> Jackrabbit-classloader                             | 3 mo 10 days
> Jackrabbit-ocm                                     | 3 mo 10 days

These are builds that are configured to run only when there's a change
in the related codebase, so even if they've been red for a long time,
they don't really consume build resources. As soon as someone gets
around to fixing the pending errors, I expect the CI build to start up
again automatically to verify the fix.

I suggest that we only disable *periodic* building of codebases that
have been failing for a long time.

BR,

Jukka Zitting


Re: Builds that have been failing for a while

2010-09-24 Thread Niklas Gustavsson
On Fri, Sep 24, 2010 at 10:48 AM, Jukka Zitting  wrote:
> On Fri, Sep 24, 2010 at 10:37 AM, Niklas Gustavsson
>  wrote:
>> Jackrabbit-1.6                                     | 1 mo 12 days
>> Jackrabbit-classloader                             | 3 mo 10 days
>> Jackrabbit-ocm                                     | 3 mo 10 days
>
> These are builds that are configured to run only when there's a change
> in the related codebase, so even if they've been red for a long time,
> they don't really consume build resources. As soon as someone gets
> around to fixing the pending errors, I expect the CI build to start up
> again automatically to verify the fix.
>
> I suggest that we only disable *periodic* building of codebases that
> have been failing for a long time.

These three builds are set to be checking for updates on a periodic
basis (polling the SCM every hour) and when upstream dependencies are
built.

/niklas


Re: Builds that have been failing for a while

2010-09-24 Thread Jukka Zitting
Hi,

On Fri, Sep 24, 2010 at 11:05 AM, Niklas Gustavsson
 wrote:
> These three builds are set to be checking for updates on a periodic
> basis (polling the SCM every hour) and when upstream dependencies are
> built.

That shouldn't be too much of a burden, or is it? It doesn't tie up
executors like some of the other failing builds.

I'm all for disabling builds that continuously keep failing, but in
these cases only the last build has failed, and I totally expect the
builds to go blue again as soon as someone gets around to touching the
codebases.

Instead of the time limit, would it make more sense to only disable
those jobs where >n of the last builds have failed?

BR,

Jukka Zitting


Re: Builds that have been failing for a while

2010-09-24 Thread Niklas Gustavsson
On Fri, Sep 24, 2010 at 11:24 AM, Jukka Zitting  wrote:
> That shouldn't be too much of a burden, or is it? It doesn't tie up
> executors like some of the other failing builds.

It does tie up an SCMTrigger which is a resource that keeps failing
and does require administration (they will get stuck when slaves fail
and requires killing or they will keep a thread stuck forever). That
said, it is certainly not as resource intensive as running the full
build.

> Instead of the time limit, would it make more sense to only disable
> those jobs where >n of the last builds have failed?

Reasonable idea, let me play around with a script for that purpose and
get back with a new list to compare.

/niklas


RE: Builds that have been failing for a while

2010-09-24 Thread Gav...


> -Original Message-
> From: Jukka Zitting [mailto:jukka.zitt...@gmail.com]
> Sent: Friday, 24 September 2010 7:25 PM
> To: builds@apache.org
> Subject: Re: Builds that have been failing for a while
> 
> Hi,
> 
> On Fri, Sep 24, 2010 at 11:05 AM, Niklas Gustavsson
>  wrote:
> > These three builds are set to be checking for updates on a periodic
> > basis (polling the SCM every hour) and when upstream dependencies are
> > built.
> 
> That shouldn't be too much of a burden, or is it? It doesn't tie up
> executors like some of the other failing builds.
> 
> I'm all for disabling builds that continuously keep failing, but in
> these cases only the last build has failed, and I totally expect the
> builds to go blue again as soon as someone gets around to touching the
> codebases.
> 
> Instead of the time limit, would it make more sense to only disable
> those jobs where >n of the last builds have failed?

Depends on the trigger frequency, last n builds could be used up in one day
by some
projects and take months to reach for others.

I would suggest either a combination of both methods - perhaps time of 30
days .and.
the last 5 builds failed, or something like that?

this is a new thing that needs doing, we can't have everyone replying saying
'oh yeah
please don't disable my build due to blah ...' . Lets find a sensible
setting and
stick to it. The aim is to get people to fix their builds or they will be
disabled
until they are fixed, simple.

Gav...

> 
> BR,
> 
> Jukka Zitting




Re: Builds that have been failing for a while

2010-09-24 Thread Niklas Gustavsson
On Fri, Sep 24, 2010 at 11:44 AM, Gav...  wrote:
> I would suggest either a combination of both methods - perhaps time of 30
> days .and.
> the last 5 builds failed, or something like that?

That was my plan as well. Here's the list of jobs that has failed for
more than one month and with more than 3 unsuccessful builds in a row:

Cayenne-doc| 1 mo 21 days
   | 13
clerezza-site  | 3 mo 21 days
   | 7
Felix-WebConsole   | 1 mo 23 days
   | 7
Hadoop-20-Build| 2 yr 3 mo
   | 15
Hadoop-Hdfs-21-Build   | 5 mo 6 days
   | 15
Hadoop-Hdfs-trunk  | 5 mo 20 days
   | 40
Hadoop-Mapreduce-21-Build  | 3 mo 25 days
   | 15
Hadoop-Mapreduce-trunk | 3 mo 25 days
   | 38
Hadoop-Mapreduce-trunk-Commit  | 5 mo 20 days
   | 30
Hadoop-Patch-h4.grid.sp2.yahoo.net | 3 mo 16 days
   | 18
Hadoop-Patch-h9.grid.sp2.yahoo.net | 8 mo 12 days
   | 8
Hama-Patch-Admin   | 1 mo 10 days
   | 5
Hdfs-Patch-h2.grid.sp2.yahoo.net   | 5 mo 21 days
   | 11
Hdfs-Patch-h5.grid.sp2.yahoo.net   | 5 mo 21 days
   | 11
Hive-trunk-h0.18   | 1 mo 10 days
   | 17
Hive-trunk-h0.19   | 4 mo 20 days
   | 17
jspf-trunk | 1 mo 10 days
   | 5
Mahout-Patch-Admin | 1 yr 11 mo
   | 5
mailet-standard-trunk  | 2 mo 4 days
   | 4
Mapreduce-Patch-h4.grid.sp2.yahoo.net  | 3 mo 29 days
   | 5
Mapreduce-Patch-h6.grid.sp2.yahoo.net  | 4 mo 16 days
   | 4
Nutch-trunk| 2 mo 19 days
   | 40
Pig-Patch-h7.grid.sp2.yahoo.net| 3 mo 17 days
   | 23
Pig-Patch-h8.grid.sp2.yahoo.net| 5 mo 0 days
   | 33
Shiro  | 1 mo 13 days
   | 4
tapestry-5.0-freestyle | 6 mo 11 days
   | 5
Zookeeper-Patch-h7.grid.sp2.yahoo.net  | 1 mo 20 days | 11

/niklas


Re: Builds that have been failing for a while

2010-09-24 Thread Tammo van Lessen
Hi,

On 24.09.2010 14:20, Niklas Gustavsson wrote:
> On Fri, Sep 24, 2010 at 11:44 AM, Gav...  wrote:
>> I would suggest either a combination of both methods - perhaps time of 30
>> days .and.
>> the last 5 builds failed, or something like that?
> 
> That was my plan as well. Here's the list of jobs that has failed for
> more than one month and with more than 3 unsuccessful builds in a row:
> 
[...]

Perhaps it might make sense to also notify the respective PMCs and
advertise this mailinglist as I could imagine that some PMCs are not
even aware of it.

Tammo

-- 
Tammo van Lessen - http://www.taval.de


Re: Builds that have been failing for a while

2010-09-24 Thread Tommaso Teofili
2010/9/24 Tammo van Lessen 

>
> Perhaps it might make sense to also notify the respective PMCs and
> advertise this mailinglist as I could imagine that some PMCs are not
> even aware of it.
>
> Tammo
>
> --
> Tammo van Lessen - http://www.taval.de
>

+1
Regarding clerezza-site job I am taking a look at possible failing causes.
Regards,
Tommaso


Re: Builds that have been failing for a while

2010-09-24 Thread Niklas Gustavsson
On Fri, Sep 24, 2010 at 2:36 PM, Tammo van Lessen  wrote:
> Perhaps it might make sense to also notify the respective PMCs and
> advertise this mailinglist as I could imagine that some PMCs are not
> even aware of it.

We do ask those which gets access to Hudson to follow this list for
this exact purpose.

/niklas


SSH problem again

2010-09-24 Thread Stefan Seelmann
Hi,

The SSH problem is there again.

Building remotely on ubuntu1
hudson.util.IOException2: remote file operation failed:
/home/hudson/hudson-slave/workspace/dir-skins-jdk15-ubuntu-deploy-site
at hudson.remoting.chan...@75167bb3:ubuntu1
at hudson.FilePath.act(FilePath.java:749)
at hudson.FilePath.act(FilePath.java:735)
at hudson.FilePath.mkdirs(FilePath.java:801)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1059)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
at hudson.model.Run.run(Run.java:1273)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:291)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:129)
Caused by: java.io.IOException: SSH channel is closed. (Close
requested by remote)
at 
com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:383)
at 
com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1838)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(ObjectOutputStream.java:1876)
at 
java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1537)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:329)
at hudson.remoting.Channel.send(Channel.java:419)
at hudson.remoting.Request.call(Request.java:105)
at hudson.remoting.Channel.call(Channel.java:557)
at hudson.FilePath.act(FilePath.java:742)
... 9 more

Kind Regards,
Stefan