We are having a quite complex application that runs on Spark Standalone.
In some cases the tasks from one of the workers blocks randomly for an
infinite amount of time in the RUNNING state.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27693/SparkStandaloneIssue.png>
  


Extra info:
- there aren't any errors in the logs
- ran with logger in debug and i didn't saw any relevant messages (i see
when the tasks starts but then there is not activity for it)
- the jobs are working ok if i have just only 1 worker
- the same job may execute the second time without any issues, in a proper
amount of time
- i don't have any really big partitions that could  cause delays for some
of the tasks.
- in spark 2.0 i've moved from RDD to Datasets and i have the same issue
- in spark 1.4 i was able to overcome the issue by turning on speculation,
but in spark 2.0 the blocking tasks are from different workers (while in 1.4
i have blocking tasks on only 1 worker) so speculation isn't fixing my
issue.
- i have the issue on more environments so i don't think it's hardware
related.

Did anyone experienced something similar? Any suggestions on how could i
identify the issue?

Thanks a lot!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-tasks-blockes-randomly-on-standalone-cluster-tp27693.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Reply via email to