rbtrtr opened a new issue, #6398:
URL: https://github.com/apache/hudi/issues/6398
**Description**
We're running on a cloudera cdp stack and want to upgrade to hudi 0.11.1 and
take advantage of the metadata table feature. We tried to run a simple hudi
write with generated data an got the attached stacktrace.
We have used this hudi package:
org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1.
The exception indicates that maybe something is not compatibe with the hbase
version which hudi is compiled against. Unfortunately Cloudera provides hbase
in verison 2.2.3. We're not sure if this is actually the root cause but why
does hudi need this lib if nothing is stored in hbase?
If we set _hoodie.metadata.enable_ to _false_ it's working, but we want to
take advantage of this feature.
We tried 2 things to get rid of this exception.
1) Set Index type to BLOOM -> no effect
2) Especially add the hbase server and client version to the spark shell in
the version hudi is compiled against -> no effect
**Environment Description**
* Hudi version : 0.11.1
* Spark version : 3.1.1
* Hive version : 3.1.3
* Hadoop version : 3.1.1
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no -> yarn on cloudera cdp 7.1.7
**Additional context**
. Example write:
```scala
df.write.format("hudi")
.option(HIVE_CREATE_MANAGED_TABLE.key(), false)
.option(HIVE_DATABASE.key(), "db_demo")
.option(HIVE_SYNC_ENABLED.key(), true)
.option(HIVE_SYNC_MODE.key(), "HMS")
.option(HIVE_TABLE.key(), "ht_hudi_11_1_metadata")
.option("hoodie.table.name", "ht_hudi_11_1_metadata")
.option(KEYGENERATOR_CLASS_NAME.key(),
"org.apache.hudi.keygen.NonpartitionedKeyGenerator")
.option(OPERATION.key(), "upsert")
.option(PRECOMBINE_FIELD.key(), "sequence")
.option(RECORDKEY_FIELD.key(), "id")
.option(TABLE_NAME.key(), "ht_hudi_11_1_metadata")
.option("hoodie.index.type","BLOOM")
.option("hoodie.metadata.enable", true)
.mode("append")
.save("hdfs:///.../hudi_11_1_metadata")
```
**Stacktrace**
```java
Caused by: java.lang.ExceptionInInitializerError
at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
at
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
at
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
at
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
at
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
at
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
... 28 more
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be
for an older version of HBase (2.2.3.7.1.7.0-551), this version is 2.4.9
at
org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:74)
at
org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:84)
at
org.apache.hudi.org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:98)
at
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Context.<init>(Context.java:44)
at
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<init>(Encryption.java:110)
at
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context.<clinit>(Encryption.java:107)
... 36 more
........
22/08/12 08:19:20 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed
4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID
9) (hdl-w05.charite.de executor 1):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :0
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hudi.org.apache.hadoop.hbase.io.crypto.Encryption$Context
at
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileContextBuilder.<init>(HFileContextBuilder.java:54)
at
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:105)
at
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
at
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
at
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
at
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]