Aruun opened a new issue #4701: URL: https://github.com/apache/hudi/issues/4701
Command used: spark-submit --jars /usr/lib/hudi/hudi-utilities-bundle_2.12-0.8.0-amzn-0.jar,/usr/lib/spark/external/lib/spark-avro.jar --deploy-mode cluster --master yarn --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /usr/lib/hudi/hudi-utilities-bundle_2.12-0.8.0-amzn-0.jar --table-type COPY_ON_WRITE --source-ordering-field registration_dttm --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --target-base-path s3://<bucketname><path> --target-table hudi_test --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://<bucket><path>,hoodie.datasource.write.recordkey.field=<field>,hoodie.datasource.write.partitionpath.field=<field> Steps to reproduce the behavior: 1. Running the above command on the EMR 6.4 with Spark 3.1.2, hive 3.1.2 2. Rearranged the commands structure but no use, same issue **Expected behavior** Load data into hudi dataset using deltastreamer **Environment Description** * Hudi version : 0.8.0 * Spark version : 3.1.2 * Hive version : 3.1.2 * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Stacktrace** ```22/01/27 16:56:53 ERROR Client: Application diagnostics message: User class threw exception: java.io.IOException: Could not load key generator class org.apache.hudi.keygen.SimpleKeyGenerator at org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:99) at org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:209) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:562) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:140) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:103) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:472) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735) Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:98) at org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:97) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:87) ... 12 more Caused by: java.lang.IllegalArgumentException: Property hoodie.datasource.write.recordkey.field not found at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:43) at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:56) at org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:36) ... 17 more Exception in thread "main" org.apache.spark.SparkException: Application application_1643297796042_0012 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1253) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1645) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org