Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
Documentation says that this setting is used to disable Akka transport failure detector. Why magic number 6000s is used then? It should be maximum possible number instead of 6000s to disable heartbeat Using magic numbers like 1 hour and 40 min creates issues which are difficult to debug. Most prob

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Josh Rosen
Would you mind copying this information into a JIRA ticket to make it easier to discover / track? Thanks! On Sun, Dec 20, 2015 at 11:35 AM Alexander Pivovarov wrote: > Usually Spark EMR job fails with the following exception in 1 hour 40 min > - Job cancelled because SparkContext was shut down >

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
Usually Spark EMR job fails with the following exception in 1 hour 40 min - Job cancelled because SparkContext was shut down java.util.concurrent.RejectedExecutionException: Task scala.concurrent.impl.CallbackRunnable@2d602a14 rejected from java.util.concurrent.ThreadPoolExecutor@46a9e52[Terminate

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
Or this message Exception in thread "main" org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:703) at org.apache.spark.scheduler.DAGScheduler$$a

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
it can also fail with the following message Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 133 in stage 33.1 failed 4 times, most recent failure: Lost task 133.3 in stage 33.1 (TID 172737, ip-10-0-25-2.ec2.internal): java.io.IOException: Failed t