Hi All, After switching from standalone Spark to Mesos I'm experiencing some instability. I'm running pyspark interactively through iPython notebook, and get this crash non-deterministically (although pretty reliably in the first 2000 tasks, often much sooner).
Exception in thread "DAGScheduler" org.apache.spark.SparkException: EOF reached before Python server acknowledged at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:340) at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:311) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:70) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:253) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:251) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45) at scala.collection.mutable.HashMap.foreach(HashMap.scala:95) at org.apache.spark.Accumulators$.add(Accumulators.scala:251) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:662) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:437) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157) I'm running the following software versions on all machines: Spark: 0.8.1 (md5: 5d3c56eaf91c7349886d5c70439730b3) Mesos: 0.13.0 (md5: 220dc9c1db118bc7599d45631da578b9) Python 2.7.3 (Stackoverflow mentioned differing python versions may be to blame --- unless Spark or iPython is specifically invoking an older version under the hood mine are all the same). Ubuntu 12.0.4 I've modified mesos-daemon.sh as follows: I had problems launching the cluster with mesos-start-cluster.sh and traced the problem to (what seemed to be) a bug in mesos-daemon.sh which used a "--conf" flag that mesos-slave and mesos-master didn't recognize. I removed the flag and instead added code to read in environment variables from mesos-deploy-env.sh. mesos-start-cluster.sh then worked as advertised. Incase it's helpful, I've inclucded several files as follows: * spark_full_output <http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/spark_full_output> : output of ipython process where SparkContext was created * mesos-deploy-env.sh <http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-deploy-env.sh> : mesos config file from slave (identical to master except for MESOS_MASTER) * spark-env.sh <http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/spark-env.sh> : spark config file * mesos-master.INFO <http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-master.INFO> : log file from mesos-master * mesos-master.WARNING <http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-master.WARNING> : log file from mesos-master * mesos-daemon.sh <http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-daemon.sh> : my modified version of mesos-daemon.sh Incase anybody from Berkeley is so interested they want to interact with my deployment, my office is in Soda hall so that can definitely be arranged. -Brad Miller -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-crash-on-mesos-tp2255.html Sent from the Apache Spark User List mailing list archive at Nabble.com.