xushiyan commented on issue #5698:
URL: https://github.com/apache/hudi/issues/5698#issuecomment-1141895241
@omlomloml the root cause is most likely due to classpath missing this jar
from hive installation
```shell
➜ ls ~/Downloads/apache-hive-3.1.3-bin/lib | grep parquet
parquet-hadoop-bundle-1.10.0.jar
```
When running the script, it picks a few hive jars and put them into
classpath. This script was written for hive 2 and probably dependencies changed
in hive 3 and we need to add more jars.
I was able to run hive sync tool successfully as the java program similar to
the script's setup on EMR 6.6 with args like
```shell
tableName=footable
HUDI_HIVE_UBER_JAR=/home/hadoop/hudi-hive-sync-bundle-0.11.0.jar
HIVE_JARS=/usr/lib/hive/lib/*
java -cp ${HUDI_HIVE_UBER_JAR}:${HIVE_JARS}:$(hadoop classpath)
org.apache.hudi.hive.HiveSyncTool \
--database rxusandbox \
--table ${tableName} \
--metastore-uris thrift://hive-metastore:9083 \
--base-path s3://xxx/${tableName} \
--sync-mode hms \
--partition-value-extractor org.apache.hudi.hive.NonPartitionedExtractor \
```
As you can see, here we pick all jars from the hive and hadoop
installations, and it worked. It's not missing dependencies in
hudi-hive-sync-bundle; but rather in the classpath. We would need to update the
script accordingly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]