xushiyan commented on issue #5698:
URL: https://github.com/apache/hudi/issues/5698#issuecomment-1141895241

   @omlomloml the root cause is most likely due to classpath missing this jar 
from hive installation
   
   ```shell
   ➜ ls ~/Downloads/apache-hive-3.1.3-bin/lib | grep parquet
   parquet-hadoop-bundle-1.10.0.jar
   ```
   
   When running the script, it picks a few hive jars and put them into 
classpath. This script was written for hive 2 and probably dependencies changed 
in hive 3 and we need to add more jars.
   
   I was able to run hive sync tool successfully as the java program similar to 
the script's setup on EMR 6.6 with args like
   
   ```shell
   tableName=footable
   HUDI_HIVE_UBER_JAR=/home/hadoop/hudi-hive-sync-bundle-0.11.0.jar
   HIVE_JARS=/usr/lib/hive/lib/*
   java -cp ${HUDI_HIVE_UBER_JAR}:${HIVE_JARS}:$(hadoop classpath) 
org.apache.hudi.hive.HiveSyncTool \
   --database rxusandbox \
   --table ${tableName} \
   --metastore-uris thrift://hive-metastore:9083 \
   --base-path s3://xxx/${tableName} \
   --sync-mode hms \
   --partition-value-extractor org.apache.hudi.hive.NonPartitionedExtractor \
   ```
   
   As you can see, here we pick all jars from the hive and hadoop 
installations, and it worked. It's not missing dependencies in 
hudi-hive-sync-bundle; but rather in the classpath. We would need to update the 
script accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to