ok, based on the timing, i *think* this might be the culprit:

https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/3814/console

On Tue, Oct 20, 2015 at 3:35 PM, shane knapp <skn...@berkeley.edu> wrote:
> -06 just kinda came back...
>
> [root@amp-jenkins-worker-06 ~]# uptime
>  15:29:07 up 26 days,  7:34,  2 users,  load average: 1137.91, 1485.69, 
> 1635.89
>
> the builds that, from looking at the process table, seem to be at
> fault are the Spark-Master-Maven-pre-yarn matrix builds, and possibly
> a Spark-Master-SBT matrix build.  look at the build history here:
> https://amplab.cs.berkeley.edu/jenkins/computer/amp-jenkins-worker-06/builds
>
> the load is dropping significantly and quickly, but swap is borked and
> i'm still going to reboot.
>
> On Tue, Oct 20, 2015 at 3:24 PM, shane knapp <skn...@berkeley.edu> wrote:
>> starting this saturday (oct 17) we started getting alerts on the
>> jenkins workers that various processes were dying (specifically ssh).
>>
>> since then, we've had half of our workers OOM due to java processes
>> and have had now to reboot two of them (-05 and -06).
>>
>> if we look at the current machine that's wedged (amp-jenkins-worker-06), we 
>> see:
>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/3814/
>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=2.0.0-mr1-cdh4.1.2,label=spark-test/4508/
>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=1.2.1,label=spark-test/4508/
>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3868/
>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Compile-Master-Maven-with-YARN/4510/
>>
>> have there been any changes to any of these builds that might have
>> caused this?  anyone have any ideas?
>>
>> sadly, even though i saw that -06 was about to OOM and got a shell
>> opened before SSH died, my command prompt is completely unresponsive.
>> :(
>>
>> shane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to