[GitHub] [hudi] KarthickAN opened a new issue #2154: [SUPPORT] Throwing org.apache.spark.shuffle.FetchFailedException consistently

GitBox Thu, 08 Oct 2020 05:52:15 -0700


KarthickAN opened a new issue #2154:
URL: https://github.com/apache/hudi/issues/2154



   Any insight into this issue ? I keep getting this consistenly. Need help in 
resolving this.
   
   
   **Stacktrace:**
   py4j.protocol.Py4JJavaError: An error occurred while calling o171.save.
   : org.apache.spark.SparkException: Job aborted due to stage failure: 
ShuffleMapStage 9 (flatMapToPair at HoodieBloomIndex.java:302) has failed the 
maximum allowable number of times: 4. Most recent failure reason: 
org.apache.spark.shuffle.FetchFailedException: 
java.util.concurrent.TimeoutException: Timeout waiting for task.         at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:554)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:485)
     at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
      at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)       
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)       at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)       at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)   
     at org.apache.spark.InterruptibleIterator.hasNext(Interruptible
 Iterator.scala:37)     at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
  at 
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:102)
     at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)      at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)      at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)     at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)     at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)      at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)     at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)     at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)      at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)     at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)     at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)      at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)     at org.apache
 .spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)         at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)      at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)     at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)     at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)      at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)     at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)   at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)   at 
org.apache.spark.scheduler.Task.run(Task.scala:121)  at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)    at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
     
 at java.lang.Thread.run(Thread.java:748) Caused by: 
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout 
waiting for task.       at 
org.spark_project.guava.base.Throwables.propagate(Throwables.java:160)       at 
org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:258)
        at 
org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:70)
   at 
org.apache.spark.network.crypto.AuthClientBootstrap.doSaslAuth(AuthClientBootstrap.java:115)
         at 
org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:74)
         at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:257)
         at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
         at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:114)
         at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllO
 utstanding(RetryingBlockFetcher.java:141)      at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)    
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
     at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    ... 1 more Caused by: java.util.concurrent.TimeoutException: Timeout 
waiting for task.  at 
org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
     at 
org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
   at 
org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:254)
        ... 14 more 
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1493)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2107)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1364)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
        at 
org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1472)
        at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
        at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1472)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1471)
        at 
org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
        at 
org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:164)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql
   .DataFrameWriter.save(DataFrameWriter.scala:229)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   
   
   **Configuration**
   {
   "hoodie.table.Name": "event_processed_cow_jd",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.recordkey.field": 
"sourceid,sourceassetid,sourceeventid,value,timestamp",
   "hoodie.datasource.write.table.Type": "COPY_ON_WRITE",
   "hoodie.datasource.write.partitionpath.field": "date,sourceid",
   "hoodie.datasource.write.hive_style_partitioning": true,
   "hoodie.datasource.write.table.Name": "event_processed_cow_jd",
   "hoodie.datasource.write.operation": "insert",
   "hoodie.parquet.compression.codec": "snappy",
   "hoodie.parquet.compression.ratio": "6",
   "hoodie.parquet.small.file.limit": "104857600",
   "hoodie.parquet.max.file.size": "134217728",
   "hoodie.parquet.block.size": "134217728",
   "hoodie.copyonwrite.insert.split.size": "4880640",
   "hoodie.copyonwrite.record.size.estimate": "165",
   "hoodie.cleaner.commits.retained": 1,
   "hoodie.combine.before.insert": true,
   "hoodie.datasource.write.precombine.field": "timestamp",
   "hoodie.insert.shuffle.parallelism": 10,
   "hoodie.datasource.write.insert.drop.duplicates": true
   }
   
   **Environment Description**
   
   Hudi version : 0.6.0
   
   Spark version : 2.4.3
   
   Hadoop version : 2.8.5-amzn-1
   
   Storage (HDFS/S3/GCS..) : S3
   
   Running on Docker? (yes/no) : No. Running on AWS Glue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] KarthickAN opened a new issue #2154: [SUPPORT] Throwing org.apache.spark.shuffle.FetchFailedException consistently

Reply via email to