ennox108 opened a new issue, #12593:
URL: https://github.com/apache/hudi/issues/12593

   We upgraded EMR from 6.11.1 to 7.2.0 and trying to run a hudi job which runs 
for 4 data sources. I am able to execute the job for 3 data sources but the 
jobs keeps failing for 1 source with the below error
   
   
![{7905A290-A687-442A-A418-75996C36892B}](https://github.com/user-attachments/assets/4f8c97ea-58fa-46c7-9bfa-79ada38a5c35)
   
   I have tried re ingesting the source tables used for this job as well as re 
creating the table where the data is written.
   
   I am using the following hudi options
   
   hudi_options = {
       'hoodie.table.name': table_name,
       'hoodie.datasource.write.table.type': table_type or 'MERGE_ON_READ',
       'hoodie.datasource.write.table.name': table_name,
       'hoodie.datasource.write.payload.class': payload_class,
       'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.CustomKeyGenerator',
       'hoodie.datasource.write.recordkey.field': primary_keys.replace(' ', ''),
       'hoodie.datasource.write.precombine.field': precombine_key,
       'hoodie.datasource.write.partitionpath.field': 'src_db_id:SIMPLE',
       'hoodie.embed.timeline.server': False,
       'hoodie.index.type': 'BLOOM',
       'hoodie.parquet.compression.codec': 'snappy',
       'hoodie.clean.async': True,
       'hoodie.clean.max.commits': 3,
       'hoodie.parquet.max.file.size': 125829120,
       'hoodie.parquet.small.file.limit': 104857600,
       'hoodie.parquet.block.size': 125829120,
       'hoodie.metadata.enable': not overwrite,
       'hoodie.metadata.validate': True,
       'hoodie.allow.empty.commit': True,
       'hoodie.datasource.write.hive_style_partitioning': True,
       'hoodie.datasource.hive_sync.support_timestamp': True,
       'hoodie.datasource.hive_sync.jdbcurl': hive_jdbcurl,
       'hoodie.datasource.hive_sync.username': hive_username,
       'hoodie.datasource.hive_sync.password': hive_password,
       'hoodie.datasource.hive_sync.database': cdm_db,
       'hoodie.datasource.hive_sync.table': table_name,
       'hoodie.datasource.hive_sync.partition_fields': 'src_db_id',
       'hoodie.datasource.hive_sync.enable': True,
       'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       'hoodie.compact.inline': True,
       'hoodie.compact.inline.trigger.strategy': 'NUM_OR_TIME',
       'hoodie.compact.inline.max.delta.commits': 1,
       'hoodie.compact.inline.max.delta.seconds': 3600
   }
   
   Application being used-
   EMR 7.2.0
   Spark 3.5.1
   Hadoop 3.3.6
   
   The same job is working without any issue with the old EMR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to