We are having a quite complex application that runs on Spark Standalone. In some cases the tasks from one of the workers blocks randomly for an infinite amount of time in the RUNNING state. <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27693/SparkStandaloneIssue.png>
Extra info: - there aren't any errors in the logs - ran with logger in debug and i didn't saw any relevant messages (i see when the tasks starts but then there is not activity for it) - the jobs are working ok if i have just only 1 worker - the same job may execute the second time without any issues, in a proper amount of time - i don't have any really big partitions that could cause delays for some of the tasks. - in spark 2.0 i've moved from RDD to Datasets and i have the same issue - in spark 1.4 i was able to overcome the issue by turning on speculation, but in spark 2.0 the blocking tasks are from different workers (while in 1.4 i have blocking tasks on only 1 worker) so speculation isn't fixing my issue. - i have the issue on more environments so i don't think it's hardware related. Did anyone experienced something similar? Any suggestions on how could i identify the issue? Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-standalone-cluster-tp27693.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
