Hi All,

After switching from standalone Spark to Mesos I'm experiencing some
instability.  I'm running pyspark interactively through iPython notebook,
and get this crash non-deterministically (although pretty reliably in the
first 2000 tasks, often much sooner).

Exception in thread "DAGScheduler" org.apache.spark.SparkException: EOF
reached before Python server acknowledged
        at
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:340)
        at
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:311)
        at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:70)
        at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:253)
        at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:251)
        at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at 
scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
        at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
        at org.apache.spark.Accumulators$.add(Accumulators.scala:251)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:662)
        at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:437)
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
        at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

I'm running the following software versions on all machines:
Spark: 0.8.1  (md5: 5d3c56eaf91c7349886d5c70439730b3)
Mesos: 0.13.0  (md5: 220dc9c1db118bc7599d45631da578b9)
Python 2.7.3 (Stackoverflow mentioned differing python versions may be to
blame --- unless Spark or iPython is specifically invoking an older version
under the hood mine are all the same).
Ubuntu 12.0.4

I've modified mesos-daemon.sh as follows:
I had problems launching the cluster with mesos-start-cluster.sh and traced
the problem to (what seemed to be) a bug in mesos-daemon.sh which used a
"--conf" flag that mesos-slave and mesos-master didn't recognize.  I removed
the flag and instead added code to read in environment variables from
mesos-deploy-env.sh.  mesos-start-cluster.sh then worked as advertised.

Incase it's helpful, I've inclucded several files as follows:
* spark_full_output
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/spark_full_output>
 
: output of ipython process where SparkContext was created
* mesos-deploy-env.sh
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-deploy-env.sh>
 
: mesos config file from slave (identical to master except for MESOS_MASTER)
* spark-env.sh
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/spark-env.sh> 
: spark config file
* mesos-master.INFO
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-master.INFO>
 
: log file from mesos-master
* mesos-master.WARNING
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-master.WARNING>
 
: log file from mesos-master
* mesos-daemon.sh
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-daemon.sh>
 
: my modified version of mesos-daemon.sh

Incase anybody from Berkeley is so interested they want to interact with my
deployment, my office is in Soda hall so that can definitely be arranged.

-Brad Miller



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-crash-on-mesos-tp2255.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to