starting this saturday (oct 17) we started getting alerts on the
jenkins workers that various processes were dying (specifically ssh).

since then, we've had half of our workers OOM due to java processes
and have had now to reboot two of them (-05 and -06).

if we look at the current machine that's wedged (amp-jenkins-worker-06), we see:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/3814/
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=2.0.0-mr1-cdh4.1.2,label=spark-test/4508/
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=1.2.1,label=spark-test/4508/
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3868/
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Compile-Master-Maven-with-YARN/4510/

have there been any changes to any of these builds that might have
caused this?  anyone have any ideas?

sadly, even though i saw that -06 was about to OOM and got a shell
opened before SSH died, my command prompt is completely unresponsive.
:(

shane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to