GnsCy opened a new issue, #5729:
URL: https://github.com/apache/hudi/issues/5729

   Running the demo setup as described 
[here](https://hudi.apache.org/docs/docker_demo) for v0.11 results in jar files 
missing error when running `spark-submit` and `hive-sync` commands.
   
   Steps to reproduce the behavior:
   
   1. Clone repo and switch to 0.11 release tag
   2. Setup the docker environments
   3. Publish events to kafka
   4. Try to run the spark-submit job to ingest data
   
   **Expected behavior**
   
   The demo environment is setup correctly and be able to go through all the 
scenarios of the demo.
   
   **Environment Description**
   
   * Hudi version :0.11
   
   * Spark version : 2.4.4
   
   * Hive version : 2.3.3
   
   * Hadoop version : 2.8.4
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```
   spark-submit \
   >   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
   >   --table-type COPY_ON_WRITE \
   >   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   >   --source-ordering-field ts  \
   >   --target-base-path /user/hive/warehouse/stock_ticks_cow \
   >   --target-table stock_ticks_cow --props 
/var/demo/config/kafka-source.properties \
   >   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   22/05/31 06:54:24 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   22/05/31 06:54:24 WARN DependencyUtils: Local jar 
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar does 
not exist, skipping.
   22/05/31 06:54:24 WARN SparkSubmit$$anon$2: Failed to load 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.
   java.lang.ClassNotFoundException: 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)`
   ```
   
   ```
   hive-sync ->
   Exception in thread "main" org.apache.hudi.exception.HoodieException: Got 
runtime exception when hive syncing stock_ticks_cow
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141)
        at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:433)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
SQL CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( 
`_hoodie_commit_time` string, `_hoodie_commit_seqno` string, 
`_hoodie_record_key` string, `_hoodie_partition_path` string, 
`_hoodie_file_name` string, `volume` bigint, `ts` string, `symbol` string, 
`year` int, `month` string, `high` double, `low` double, `key` string, `date` 
string, `close` double, `open` double, `day` string) PARTITIONED BY (`dt` 
String)   ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
SERDEPROPERTIES 
('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow')
 STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION '/user/hive/warehouse/stock_ticks_cow' 
TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.s
 
ources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable"
 
:false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
        at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:67)
        at 
org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.createTable(QueryBasedDDLExecutor.java:84)
        at 
org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:168)
        at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:276)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:217)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:150)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:138)
        ... 1 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
compiling statement: FAILED: SemanticException Cannot find class 
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
        at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
        at 
org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:313)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253)
        at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:65)
        ... 7 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to