Maybe the machines are running out of memory? I have heard of Linux killing random processes to release memory.

On 14-08-12 10:48, Lukas Rytz wrote:


On Tuesday, August 14, 2012 10:45:38 AM UTC+2, Richard Bywater wrote:

    Wild guess but are the builds happening on a Windows based slave and
    is someone logging out whilst the builds are running?


Thanks for the pointer! But that cannot be it - they are all linux slaves running
with the SSH Slaves Plugin, and they are dedicated machines, nobody is
interacting with them..


    I've had problems in the past with this (its a thing you can get
    around by passing the right argument -- -Xrs I think from memory)

    Might be nowhere near the issue but just in case :)

    Cheers
    Richard.

    On Tue, Aug 14, 2012 at 8:41 PM, Lukas Rytz <lukas...@epfl.ch
    <javascript:>> wrote:
    > Well, that's unfortunately not the case. I changed our setup to
    never run
    > builds of
    > the same job on the same machine in parallel, but the aborts
    still happen.
    > Just
    > less often.
    >
    > The aborts always come in batches. The last batch was 48 aborts
    at the same
    > time,
    > each producing the same message in the Jenkins log (see first
    post).
    >
    > I'm mostly wondering if no-one ever experienced this problem..
    >
    > Lukas
    >
    >
    > On Sunday, July 29, 2012 11:55:37 AM UTC+2, Lukas Rytz wrote:
    >>
    >> Further observation: it seems to happen only when running multiple
    >> concurrent builds
    >> of the same job on the same slave (but not when running
    multiple builds on
    >> separate
    >> slaves, at least it seems that way currently).
    >>
    >>
    >>
    >>
    >> On Saturday, July 28, 2012 3:04:41 PM UTC+2, Lukas Rytz wrote:
    >>>
    >>> Hi all,
    >>>
    >>>
    >>> Lately we see quite a lot of jobs (~10 %) that just abort
    without any
    >>> intervention.
    >>> Somebody else ever had similar problems?
    >>>
    >>> No error message in the console output:
    >>>
    >>> [...]
    >>> [partest] testing:
    >>> [...]/run/reflection-constructormirror-nested-good.scala [ OK ]
    >>> [partest] testing: [...]/files/run/viewtest.scala [ OK ]
    >>> [partest] testing: [...]/files/run/reify_newimpl_20.scala [ OK ]
    >>> Build was aborted
    >>> Archiving artifacts
    >>> Checking console output
    >>> Email was triggered for: Aborted
    >>> Sending email for trigger: Aborted
    >>>
    >>> The abort is not because of a timeout (build timeout plugin).
    >>> The Jenkins logs say that the abort is due to an un-cougth
    >>> InterruptedException, stack trace
    >>> below. It always looks the same.
    >>>
    >>> I think the reason is an InterruptedException in master-slave
    >>> communication. The slaves are
    >>> connected over SSH using the "SSH Slaves Plugin".
    >>>
    >>> I don't think that the exception is caused by our testing tool
    - this is
    >>> running on the client in
    >>> another (JVM) process, so even if it quits with an
    InterruptedException,
    >>> that should not abort
    >>> the Jenkins build.
    >>>
    >>>
    >>> Thanks for any pointers!
    >>> Lukas
    >>>
    >>>
    >>>
    >>> Jenkins Log:
    >>>
    >>> INFO: scala-checkin #6609 aborted
    >>> java.lang.InterruptedException
    >>>   at java.lang.Object.wait(Native Method)
    >>>   at hudson.remoting.Request.call(Request.java:146)
    >>>   at hudson.remoting.Channel.call(Channel.java:663)
    >>>   at
    >>>
    
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)

    >>>   at $Proxy36.join(Unknown Source)
    >>>   at
    hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861)
    >>>   at hudson.Launcher$ProcStarter.join(Launcher.java:345)
    >>>   at
    hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
    >>>   at
    hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
    >>>   at
    hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
    >>>   at
    >>>
    
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)

    >>>   at hudson.model.Build$BuildExecution.build(Build.java:199)
    >>>   at hudson.model.Build$BuildExecution.doRun(Build.java:160)
    >>>   at
    >>>
    
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)

    >>>   at hudson.model.Run.execute(Run.java:1488)
    >>>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
    >>>   at
    hudson.model.ResourceController.execute(ResourceController.java:88)
    >>>   at hudson.model.Executor.run(Executor.java:236)
    >>>
    >>>
    >>>
    >


Reply via email to