pushpavanthar commented on issue #8614:
URL: https://github.com/apache/hudi/issues/8614#issuecomment-1772585088

   @danny0405 we are facing same issue in Hudi version 0.13.1 and spark version 
3.2.1 and 3.3.2. Below is the command we use to run, Same command used to work 
fine with Hudi 0.11.1. 
   `spark-submit --master yarn --packages 
org.apache.spark:spark-avro_2.12:3.2.1,org.apache.hudi:hudi-utilities-bundle_2.12:0.13.1,org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.1,org.apache.hudi:hudi-aws-bundle:0.13.1
 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --conf 
spark.executor.cores=5 --conf spark.driver.memory=3200m --conf 
spark.driver.memoryOverhead=800m --conf spark.executor.memoryOverhead=1400m 
--conf spark.executor.memory=14600m --conf spark.dynamicAllocation.enabled=true 
--conf spark.dynamicAllocation.initialExecutors=1 --conf 
spark.dynamicAllocation.minExecutors=1 --conf 
spark.dynamicAllocation.maxExecutors=21 --conf spark.scheduler.mode=FAIR --conf 
spark.task.maxFailures=5 --conf spark.rdd.compress=true --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.shuffle.service.enabled=true --conf 
spark.sql.hive.convertMetastoreParquet=false --conf 
spark.yarn.max.executor.failures=5 --conf spark.driver.userClassPathFirst=true
  --conf spark.executor.userClassPathFirst=true --conf 
spark.sql.catalogImplementation=hive --deploy-mode client 
s3://bucket_name/custom_jar-2.0.jar --hoodie-conf 
hoodie.parquet.compression.codec=snappy --hoodie-conf 
hoodie.deltastreamer.source.hoodieincr.num_instants=100 --table-type 
COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.HoodieIncrSource 
--hoodie-conf 
hoodie.deltastreamer.source.hoodieincr.path=s3://bucket_name/ml_attributes/features
 --hoodie-conf hoodie.metrics.on=true --hoodie-conf 
hoodie.metrics.reporter.type=PROMETHEUS_PUSHGATEWAY --hoodie-conf 
hoodie.metrics.pushgateway.host=pushgateway.in --hoodie-conf 
hoodie.metrics.pushgateway.port=443 --hoodie-conf 
hoodie.metrics.pushgateway.delete.on.shutdown=false --hoodie-conf 
hoodie.metrics.pushgateway.job.name=hudi_transformed_features_accounts_hudi 
--hoodie-conf hoodie.metrics.pushgateway.random.job.name.suffix=false 
--hoodie-conf hoodie.metadata.enable=true --hoodie-conf 
hoodie.metrics.reporter.metricsname.pr
 efix=hudi --target-base-path s3://bucket_name_transformed/features_accounts 
--target-table features_accounts --enable-sync --hoodie-conf 
hoodie.datasource.hive_sync.database=hudi_transformed --hoodie-conf 
hoodie.datasource.hive_sync.table=features_accounts --sync-tool-classes 
org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool --hoodie-conf 
hoodie.datasource.write.recordkey.field=id,pos --hoodie-conf 
hoodie.datasource.write.precombine.field=id --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
 --hoodie-conf hoodie.datasource.write.partitionpath.field=created_at_dt 
--hoodie-conf hoodie.datasource.hive_sync.partition_fields=created_at_dt 
--hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
 --hoodie-conf hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING 
--hoodie-conf 
"hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z',
 yyyy-MM-dd' 'HH:mm:ss.SSSSSS,yyyy-MM-dd' 'HH:mm:ss,yyyy-MM-dd'T'HH:mm:ss'Z'" 
--hoodie-conf 
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd 
--source-ordering-field id --hoodie-conf secret.key.name=some-secret 
--hoodie-conf transformer.decrypt.cols=features_json --hoodie-conf 
transformer.uncompress.cols=false --hoodie-conf 
transformer.jsonToStruct.column=features_json --hoodie-conf 
transformer.normalize.column=features_json.accounts --hoodie-conf 
transformer.copy.fields=created_at,created_at_dt --transformer-class 
com.custom.transform.DecryptTransformer,com.custom.transform.JsonToStructTypeTransformer,com.custom.transform.NormalizeArrayTransformer,com.custom.transform.FlatteningTransformer,com.custom.transform.CopyFieldTransformer`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to