tooptoop4 opened a new issue #1802:
URL: https://github.com/apache/hudi/issues/1802


   Hudi 0.5.3, doing delete operation on partitioned COW table:
   
   ```
   2020-07-06 08:52:54,847 [main] INFO  
org.apache.spark.sql.execution.datasources.FileSourceStrategy - Output Data 
Schema: struct<city: string, id: bigint, latitude: double, date_time: string 
... 2 more fields>
   2020-07-06 08:53:01,260 [dispatcher-event-loop-1] INFO  
org.apache.spark.storage.BlockManagerInfo - Added broadcast_30_piece0 in memory 
on reda.167:38001 (size: 66.4 KB, free: 7.0 GB)
   2020-07-06 08:53:01,273 [dispatcher-event-loop-2] INFO  
org.apache.spark.MapOutputTrackerMasterEndpoint - Asked to send map output 
locations for shuffle 7 to reda.167:56648
   2020-07-06 08:53:03,594 [task-result-getter-0] WARN  
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 27.0 (TID 
157, reda.167, executor 0): org.apache.hudi.exception.HoodieUpsertException: 
Error upserting bucketType UPDATE for partition :0
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:269)
           at 
org.apache.hudi.client.HoodieWriteClient.lambda$upsertRecordsInternal$9c951a5d$1(HoodieWriteClient.java:472)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
           at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
           at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:262)
           ... 30 more
   Caused by: org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:205)
           ... 32 more
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at java.util.concurrent.FutureTask.report(FutureTask.java:122)
           at java.util.concurrent.FutureTask.get(FutureTask.java:192)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
           ... 33 more
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:227)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:206)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:257)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$133(BoundedInMemoryExecutor.java:121)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           ... 3 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read 
value at 0 in block -1 in file 
s3a://reda/3/c0abeb53-d684-4376-b4bd-6fbafa3a41e3-0_1-23-157_20200706085042.parquet
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
           at 
org.apache.hudi.client.utils.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
           at 
org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$131(BoundedInMemoryExecutor.java:92)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           ... 4 more
   Caused by: java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
           at 
org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
           at 
org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
           at 
org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
           at 
org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
           at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
           at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
           ... 11 more
   
   2020-07-06 08:53:03,595 [dispatcher-event-loop-3] INFO  
org.apache.spark.scheduler.TaskSetManager - Starting task 0.1 in stage 27.0 
(TID 158, reda.167, executor 0, partition 0, NODE_LOCAL, 7666 bytes)
   2020-07-06 08:53:05,636 [task-result-getter-1] INFO  
org.apache.spark.scheduler.TaskSetManager - Lost task 0.1 in stage 27.0 (TID 
158) on reda.167, executor 0: org.apache.hudi.exception.HoodieUpsertException 
(Error upserting bucketType UPDATE for partition :0) [duplicate 1]
   2020-07-06 08:53:05,637 [dispatcher-event-loop-3] INFO  
org.apache.spark.scheduler.TaskSetManager - Starting task 0.2 in stage 27.0 
(TID 159, reda.167, executor 0, partition 0, NODE_LOCAL, 7666 bytes)
   2020-07-06 08:53:08,066 [task-result-getter-3] INFO  
org.apache.spark.scheduler.TaskSetManager - Lost task 0.2 in stage 27.0 (TID 
159) on reda.167, executor 0: org.apache.hudi.exception.HoodieUpsertException 
(Error upserting bucketType UPDATE for partition :0) [duplicate 2]
   2020-07-06 08:53:08,067 [dispatcher-event-loop-3] INFO  
org.apache.spark.scheduler.TaskSetManager - Starting task 0.3 in stage 27.0 
(TID 160, reda.167, executor 0, partition 0, NODE_LOCAL, 7666 bytes)
   2020-07-06 08:53:10,713 [task-result-getter-2] INFO  
org.apache.spark.scheduler.TaskSetManager - Lost task 0.3 in stage 27.0 (TID 
160) on reda.167, executor 0: org.apache.hudi.exception.HoodieUpsertException 
(Error upserting bucketType UPDATE for partition :0) [duplicate 3]
   2020-07-06 08:53:10,714 [task-result-getter-2] ERROR 
org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 27.0 failed 4 
times; aborting job
   2020-07-06 08:53:10,715 [task-result-getter-2] INFO  
org.apache.spark.scheduler.TaskSchedulerImpl - Removed TaskSet 27.0, whose 
tasks have all completed, from pool
   2020-07-06 08:53:10,718 [dag-scheduler-event-loop] INFO  
org.apache.spark.scheduler.TaskSchedulerImpl - Cancelling stage 27
   2020-07-06 08:53:10,718 [dag-scheduler-event-loop] INFO  
org.apache.spark.scheduler.TaskSchedulerImpl - Killing all running tasks in 
stage 27: Stage cancelled
   2020-07-06 08:53:10,719 [dag-scheduler-event-loop] INFO  
org.apache.spark.scheduler.DAGScheduler - ResultStage 27 (count at 
HudiInterface.scala:219) failed in 9.487 s due to Job aborted due to stage 
failure: Task 0 in stage 27.0 failed 4 times, most recent failure: Lost task 
0.3 in stage 27.0 (TID 160, reda.167, executor 0): 
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:269)
           at 
org.apache.hudi.client.HoodieWriteClient.lambda$upsertRecordsInternal$9c951a5d$1(HoodieWriteClient.java:472)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
           at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
           at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:262)
           ... 30 more
   Caused by: org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:205)
           ... 32 more
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at java.util.concurrent.FutureTask.report(FutureTask.java:122)
           at java.util.concurrent.FutureTask.get(FutureTask.java:192)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
           ... 33 more
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:227)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:206)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:257)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$133(BoundedInMemoryExecutor.java:121)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           ... 3 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read 
value at 0 in block -1 in file 
s3a://reda/3/c0abeb53-d684-4376-b4bd-6fbafa3a41e3-0_1-23-157_20200706085042.parquet
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
           at 
org.apache.hudi.client.utils.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
           at 
org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$131(BoundedInMemoryExecutor.java:92)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           ... 4 more
   Caused by: java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
           at 
org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
           at 
org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
           at 
org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
           at 
org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
           at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
           at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
           ... 11 more
   
   Driver stacktrace:
   2020-07-06 08:53:10,720 [main] INFO  org.apache.spark.scheduler.DAGScheduler 
- Job 13 failed: count at HudiInterface.scala:219, took 9.684038 s
   Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
to stage failure: Task 0 in stage 27.0 failed 4 times, most recent failure: 
Lost task 0.3 in stage 27.0 (TID 160, reda.167, executor 0): 
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:269)
           at 
org.apache.hudi.client.HoodieWriteClient.lambda$upsertRecordsInternal$9c951a5d$1(HoodieWriteClient.java:472)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
           at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
           at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:262)
           ... 30 more
   Caused by: org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:205)
           ... 32 more
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at java.util.concurrent.FutureTask.report(FutureTask.java:122)
           at java.util.concurrent.FutureTask.get(FutureTask.java:192)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
           ... 33 more
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:227)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:206)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:257)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$133(BoundedInMemoryExecutor.java:121)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           ... 3 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read 
value at 0 in block -1 in file 
s3a://reda/3/c0abeb53-d684-4376-b4bd-6fbafa3a41e3-0_1-23-157_20200706085042.parquet
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
           at 
org.apache.hudi.client.utils.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
           at 
org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$131(BoundedInMemoryExecutor.java:92)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           ... 4 more
   Caused by: java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
           at 
org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
           at 
org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
           at 
org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
           at 
org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
           at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
           at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
           ... 11 more
   
   Driver stacktrace:
           at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
           at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
           at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
           at scala.Option.foreach(Option.scala:257)
           at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
           at org.apache.spark.rdd.RDD.count(RDD.scala:1213)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting 
bucketType UPDATE for partition :0
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:269)
           at 
org.apache.hudi.client.HoodieWriteClient.lambda$upsertRecordsInternal$9c951a5d$1(HoodieWriteClient.java:472)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
           at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
           at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
           at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
           at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:262)
           ... 30 more
   Caused by: org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
           at 
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:205)
           ... 32 more
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
           at java.util.concurrent.FutureTask.report(FutureTask.java:122)
           at java.util.concurrent.FutureTask.get(FutureTask.java:192)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
           ... 33 more
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:227)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:206)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:257)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$133(BoundedInMemoryExecutor.java:121)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           ... 3 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read 
value at 0 in block -1 in file 
s3a://reda/3/c0abeb53-d684-4376-b4bd-6fbafa3a41e3-0_1-23-157_20200706085042.parquet
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
           at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
           at 
org.apache.hudi.client.utils.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
           at 
org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$131(BoundedInMemoryExecutor.java:92)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           ... 4 more
   Caused by: java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
           at 
org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
           at 
org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
           at 
org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
           at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
           at 
org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
           at 
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
           at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
           at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
           at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
           ... 11 more
   2020-07-06 08:53:10,745 [pool-1-thread-1] INFO  
org.apache.spark.SparkContext - Invoking stop() from shutdown hook
   2020-07-06 08:53:10,754 [pool-1-thread-1] INFO  
org.spark_project.jetty.server.AbstractConnector - Stopped 
Spark@76a1146d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
   2020-07-06 08:53:10,766 [pool-1-thread-1] INFO  org.apache.spark.ui.SparkUI 
- Stopped Spark web UI at http://reda.232:4041
   2020-07-06 08:53:10,776 [pool-1-thread-1] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down 
all executors
   2020-07-06 08:53:10,776 [dispatcher-event-loop-1] INFO  
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint 
- Asking each executor to shut down
   2020-07-06 08:53:13,401 [dispatcher-event-loop-0] INFO  
org.apache.spark.MapOutputTrackerMasterEndpoint - 
MapOutputTrackerMasterEndpoint stopped!
   2020-07-06 08:53:13,799 [pool-1-thread-1] INFO  
org.apache.spark.storage.memory.MemoryStore - MemoryStore cleared
   2020-07-06 08:53:13,802 [pool-1-thread-1] INFO  
org.apache.spark.storage.BlockManager - BlockManager stopped
   2020-07-06 08:53:13,804 [pool-1-thread-1] INFO  
org.apache.spark.storage.BlockManagerMaster - BlockManagerMaster stopped
   2020-07-06 08:53:13,869 [dispatcher-event-loop-3] INFO  
org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint
 - OutputCommitCoordinator stopped!
   2020-07-06 08:53:14,169 [pool-1-thread-1] INFO  
org.apache.spark.SparkContext - Successfully stopped SparkContext
   2020-07-06 08:53:14,171 [pool-1-thread-1] INFO  
org.apache.spark.util.ShutdownHookManager - Shutdown hook called
   2020-07-06 08:53:14,172 [pool-1-thread-1] INFO  
org.apache.spark.util.ShutdownHookManager - Deleting directory 
/tmp/spark-abe533e8-3621-41d7-85d4-ada816644515
   2020-07-06 08:53:14,238 [pool-1-thread-1] INFO  
org.apache.spark.util.ShutdownHookManager - Deleting directory 
/tmp/spark-e692da4e-5169-4704-9675-9c9d853d6069
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to