starting this saturday (oct 17) we started getting alerts on the jenkins workers that various processes were dying (specifically ssh).
since then, we've had half of our workers OOM due to java processes and have had now to reboot two of them (-05 and -06). if we look at the current machine that's wedged (amp-jenkins-worker-06), we see: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/3814/ https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=2.0.0-mr1-cdh4.1.2,label=spark-test/4508/ https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=1.2.1,label=spark-test/4508/ https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3868/ https://amplab.cs.berkeley.edu/jenkins/job/Spark-Compile-Master-Maven-with-YARN/4510/ have there been any changes to any of these builds that might have caused this? anyone have any ideas? sadly, even though i saw that -06 was about to OOM and got a shell opened before SSH died, my command prompt is completely unresponsive. :( shane --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org