Ok. I tried setting the partition number to 128 and numbers greater than 128,
and now I get another error message about "Java heap space". Is it possible
that there is something wrong with the setup of my Spark cluster to begin
with? Or is it still an issue with partitioning my data? Or do I just need
more worker nodes?


ERROR TaskSetManager: Task 194.0:14 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted: Task 194.0:14 failed 4 times
(most recent failure: Exception failure: java.lang.OutOfMemoryError: Java
heap space)                                                                     
                       
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
                                                                                
                            
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
                                                                                
                            
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)  
                                   
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)              
                                   
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
                                                                                
                                             
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
                     
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
                     
        at scala.Option.foreach(Option.scala:236)                               
                                              
        at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)    
                                   
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
                                                                                
                                          
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)             
                                              
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)                     
                                              
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)              
                                              
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)                         
                                              
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
                     
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)            
                                   
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                                   
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)        
                                   
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to