HariprasadAllaka1612 opened a new issue #1641:
URL: https://github.com/apache/incubator-hudi/issues/1641
Parquet schema changing for various writes to Hudi.
With the continuous writes to S3 in Hudi format, there are instance the
schema of Paruet file is changing and when writing/upserting to same partition
we are getting a merge error, I am using COW storage format.
**To Reproduce**
Steps to reproduce the behavior:
1. Write the dataframe multiple times to same partition,
**Expected behavior**
1. Same schema for all the parquet files
**Environment Description**
* Hudi version : 0.5.1
* Spark version :2.4.0
* Hive version : 2.3.4
* Hadoop version : 2.8.5
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : No
2020-05-19 21:06:56 ERROR BoundedInMemoryExecutor:130 - error consuming
records
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
2020-05-19 21:06:59 ERROR HoodieCopyOnWriteTable:272 - Error upserting
bucketType UPDATE for partition :1
org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
... 32 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge
old record into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
2020-05-19 21:06:59 ERROR Executor:91 - Exception in task 1.0 in stage 118.0
(TID 299)
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :1
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:273)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
... 30 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
... 32 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge
old record into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
2020-05-19 21:06:59 ERROR TaskSetManager:70 - Task 1 in stage 118.0 failed 1
times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in
stage 118.0 failed 1 times, most recent failure: Lost task 1.0 in stage 118.0
(TID 299, localhost, executor driver):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :1
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:273)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
... 30 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
... 32 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge
old record into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD.count(RDD.scala:1168)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:145)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at
com.playngodataengg.dao.DataAccessS3.writeDataToRefinedHudiS3(DataAccessS3.scala:149)
at
com.playngodataengg.controller.LoginDataTransform.processData(LoginDataTransform.scala:368)
at com.playngodataengg.action.LoginData$.main(LoginData.scala:16)
at com.playngodataengg.action.LoginData.main(LoginData.scala)
Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting
bucketType UPDATE for partition :1
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:273)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
... 30 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
... 32 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge
old record into new file for key
message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
2020-05-19 21:06:59 ERROR DataEngineering:12 - (writeDataToRefinedHudiS3) -
There is an exception writing the data into data lake for login
2020-05-19 21:06:59 ERROR HoodieCopyOnWriteTable:272 - Error upserting
bucketType UPDATE for partition :3
org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
... 32 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge
old record into new file for key
message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
2020-05-19 21:07:00 ERROR HoodieCopyOnWriteTable:272 - Error upserting
bucketType UPDATE for partition :2
org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
... 32 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge
old record into new file for key
message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270
from old file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
2020-05-19 21:07:00 ERROR HoodieCopyOnWriteTable:272 - Error upserting
bucketType UPDATE for partition :0
org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
... 32 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record
into new file for key
message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge
old record into new file for key
message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155
from old file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet
to new file
s3a://gat-datalake-refined-dev/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be
cast to java.lang.Number
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
at
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
... 8 more
Process finished with exit code 0
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]