Documentation says that this setting is used to disable Akka transport
failure detector.
Why magic number 6000s is used then?
It should be maximum possible number instead of 6000s to disable heartbeat
Using magic numbers like 1 hour and 40 min creates issues which are
difficult to debug. Most prob
Would you mind copying this information into a JIRA ticket to make it
easier to discover / track? Thanks!
On Sun, Dec 20, 2015 at 11:35 AM Alexander Pivovarov
wrote:
> Usually Spark EMR job fails with the following exception in 1 hour 40 min
> - Job cancelled because SparkContext was shut down
>
Usually Spark EMR job fails with the following exception in 1 hour 40 min - Job
cancelled because SparkContext was shut down
java.util.concurrent.RejectedExecutionException: Task
scala.concurrent.impl.CallbackRunnable@2d602a14 rejected from
java.util.concurrent.ThreadPoolExecutor@46a9e52[Terminate
Or this message
Exception in thread "main" org.apache.spark.SparkException: Job
cancelled because SparkContext was shut down
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:703)
at
org.apache.spark.scheduler.DAGScheduler$$a
it can also fail with the following message
Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 133 in stage 33.1 failed 4 times,
most recent failure: Lost task 133.3 in stage 33.1 (TID 172737,
ip-10-0-25-2.ec2.internal): java.io.IOException: Failed t