Aruun opened a new issue #4701:
URL: https://github.com/apache/hudi/issues/4701


   Command used:
   spark-submit --jars 
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.8.0-amzn-0.jar,/usr/lib/spark/external/lib/spark-avro.jar
 --deploy-mode cluster --master yarn --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.8.0-amzn-0.jar --table-type 
COPY_ON_WRITE --source-ordering-field registration_dttm  --source-class 
org.apache.hudi.utilities.sources.ParquetDFSSource --target-base-path 
s3://<bucketname><path> --target-table hudi_test --transformer-class 
org.apache.hudi.utilities.transform.AWSDmsTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
  --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=s3://<bucket><path>,hoodie.datasource.write.recordkey.field=<field>,hoodie.datasource.write.partitionpath.field=<field>
   
   
   Steps to reproduce the behavior:
   
   1. Running the above command on the EMR 6.4 with Spark 3.1.2, hive 3.1.2
   2. Rearranged the commands structure but no use, same issue
   
   
   **Expected behavior**
   Load data into hudi dataset using deltastreamer
   
   
   **Environment Description**
   
   * Hudi version :
   0.8.0
   * Spark version :
   3.1.2
   * Hive version :
   3.1.2
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   S3
   * Running on Docker? (yes/no) :
   no
   
   
   **Stacktrace**
   
   ```22/01/27 16:56:53 ERROR Client: Application diagnostics message: User 
class threw exception: java.io.IOException: Could not load key generator class 
org.apache.hudi.keygen.SimpleKeyGenerator
        at 
org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:99)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:209)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:562)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:140)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:103)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:472)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class 
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:98)
        at 
org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:97)
        ... 10 more
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:87)
        ... 12 more
   Caused by: java.lang.IllegalArgumentException: Property 
hoodie.datasource.write.recordkey.field not found
        at 
org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:43)
        at 
org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:56)
        at 
org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:36)
        ... 17 more
   
   Exception in thread "main" org.apache.spark.SparkException: Application 
application_1643297796042_0012 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1253)
        at 
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1645)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to