Hi all, Configuration: Standalone 0.9.1-cdh4 cluster, 7 workers per node, 32gb per worker
I'm running a job on a spark cluster, and running into some strange behavior. After a while, the akka frame sizes exceed 10mb, and then the whole job seizes up. I set "spark.akka.frameSize" to 128 in the SparkConf used to create the spark context (and also set it as a Java system property on the driver, for good measure). After this, the program didn't hang, but immediately failed, and logged an error message like the following: (on the master): 14/05/20 21:49:50 INFO SparkDeploySchedulerBackend: Executor 1 disconnected, so removing it 14/05/20 21:49:50 ERROR TaskSchedulerImpl: Lost executor 1 on [...]: remote Akka client disassociated (on the workers): 14/05/20 21:50:25 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection... 14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Shutting down all executors 14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 14/05/20 21:50:25 INFO AppClient: Stop request to Master timed out; it may already be shut down. After lots of fumbling around, I ended up adding "-Dspark.akka.frameSize=128" to SPARK_JAVA_OPTS in spark-env.sh, under the theory that the workers couldn't read the larger akka messages. This /seems/ to have made things work, but I'm still a little scared. Is this the standard way to set the max akka framesize, or is there a way to set it from the driver and have it propagate to the workers? Thanks, Matt -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Setting-spark-akka-frameSize-tp6337.html Sent from the Apache Spark User List mailing list archive at Nabble.com.