Maybe the machines are running out of memory? I have heard of Linux
killing random processes to release memory.
On 14-08-12 10:48, Lukas Rytz wrote:
On Tuesday, August 14, 2012 10:45:38 AM UTC+2, Richard Bywater wrote:
Wild guess but are the builds happening on a Windows based slave and
is someone logging out whilst the builds are running?
Thanks for the pointer! But that cannot be it - they are all linux
slaves running
with the SSH Slaves Plugin, and they are dedicated machines, nobody is
interacting with them..
I've had problems in the past with this (its a thing you can get
around by passing the right argument -- -Xrs I think from memory)
Might be nowhere near the issue but just in case :)
Cheers
Richard.
On Tue, Aug 14, 2012 at 8:41 PM, Lukas Rytz <lukas...@epfl.ch
<javascript:>> wrote:
> Well, that's unfortunately not the case. I changed our setup to
never run
> builds of
> the same job on the same machine in parallel, but the aborts
still happen.
> Just
> less often.
>
> The aborts always come in batches. The last batch was 48 aborts
at the same
> time,
> each producing the same message in the Jenkins log (see first
post).
>
> I'm mostly wondering if no-one ever experienced this problem..
>
> Lukas
>
>
> On Sunday, July 29, 2012 11:55:37 AM UTC+2, Lukas Rytz wrote:
>>
>> Further observation: it seems to happen only when running multiple
>> concurrent builds
>> of the same job on the same slave (but not when running
multiple builds on
>> separate
>> slaves, at least it seems that way currently).
>>
>>
>>
>>
>> On Saturday, July 28, 2012 3:04:41 PM UTC+2, Lukas Rytz wrote:
>>>
>>> Hi all,
>>>
>>>
>>> Lately we see quite a lot of jobs (~10 %) that just abort
without any
>>> intervention.
>>> Somebody else ever had similar problems?
>>>
>>> No error message in the console output:
>>>
>>> [...]
>>> [partest] testing:
>>> [...]/run/reflection-constructormirror-nested-good.scala [ OK ]
>>> [partest] testing: [...]/files/run/viewtest.scala [ OK ]
>>> [partest] testing: [...]/files/run/reify_newimpl_20.scala [ OK ]
>>> Build was aborted
>>> Archiving artifacts
>>> Checking console output
>>> Email was triggered for: Aborted
>>> Sending email for trigger: Aborted
>>>
>>> The abort is not because of a timeout (build timeout plugin).
>>> The Jenkins logs say that the abort is due to an un-cougth
>>> InterruptedException, stack trace
>>> below. It always looks the same.
>>>
>>> I think the reason is an InterruptedException in master-slave
>>> communication. The slaves are
>>> connected over SSH using the "SSH Slaves Plugin".
>>>
>>> I don't think that the exception is caused by our testing tool
- this is
>>> running on the client in
>>> another (JVM) process, so even if it quits with an
InterruptedException,
>>> that should not abort
>>> the Jenkins build.
>>>
>>>
>>> Thanks for any pointers!
>>> Lukas
>>>
>>>
>>>
>>> Jenkins Log:
>>>
>>> INFO: scala-checkin #6609 aborted
>>> java.lang.InterruptedException
>>> at java.lang.Object.wait(Native Method)
>>> at hudson.remoting.Request.call(Request.java:146)
>>> at hudson.remoting.Channel.call(Channel.java:663)
>>> at
>>>
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
>>> at $Proxy36.join(Unknown Source)
>>> at
hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861)
>>> at hudson.Launcher$ProcStarter.join(Launcher.java:345)
>>> at
hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
>>> at
hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
>>> at
hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
>>> at
>>>
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
>>> at hudson.model.Build$BuildExecution.build(Build.java:199)
>>> at hudson.model.Build$BuildExecution.doRun(Build.java:160)
>>> at
>>>
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
>>> at hudson.model.Run.execute(Run.java:1488)
>>> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
>>> at
hudson.model.ResourceController.execute(ResourceController.java:88)
>>> at hudson.model.Executor.run(Executor.java:236)
>>>
>>>
>>>
>