I have one Spark job which seems to run fine but after one hour or so
executor start getting lost because of time out something like the following
error

cluster.yarnScheduler : Removing an executor 14 650000 timeout exceeds
600000 seconds

and because of above error couple of chained errors starts to come like
FetchFailedException, Rpc client disassociated, Connection reset by peer,
IOException etc

Please see the following UI page I have noticed when shuffle read/write
starts to increase more than 10 GB executors starts getting lost because of
timeout. How do I clear this stacked memory of 10 GB in shuffle read/write
section I dont cache anything why Spark is not clearing those memory. Please
guide.

IMG_20150819_231418358.jpg
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n24345/IMG_20150819_231418358.jpg>
  




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-executor-time-out-on-yarn-spark-while-dealing-with-large-shuffle-skewed-data-tp24345.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to