ennox108 opened a new issue, #12593: URL: https://github.com/apache/hudi/issues/12593
We upgraded EMR from 6.11.1 to 7.2.0 and trying to run a hudi job which runs for 4 data sources. I am able to execute the job for 3 data sources but the jobs keeps failing for 1 source with the below error  I have tried re ingesting the source tables used for this job as well as re creating the table where the data is written. I am using the following hudi options hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.table.type': table_type or 'MERGE_ON_READ', 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.payload.class': payload_class, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.CustomKeyGenerator', 'hoodie.datasource.write.recordkey.field': primary_keys.replace(' ', ''), 'hoodie.datasource.write.precombine.field': precombine_key, 'hoodie.datasource.write.partitionpath.field': 'src_db_id:SIMPLE', 'hoodie.embed.timeline.server': False, 'hoodie.index.type': 'BLOOM', 'hoodie.parquet.compression.codec': 'snappy', 'hoodie.clean.async': True, 'hoodie.clean.max.commits': 3, 'hoodie.parquet.max.file.size': 125829120, 'hoodie.parquet.small.file.limit': 104857600, 'hoodie.parquet.block.size': 125829120, 'hoodie.metadata.enable': not overwrite, 'hoodie.metadata.validate': True, 'hoodie.allow.empty.commit': True, 'hoodie.datasource.write.hive_style_partitioning': True, 'hoodie.datasource.hive_sync.support_timestamp': True, 'hoodie.datasource.hive_sync.jdbcurl': hive_jdbcurl, 'hoodie.datasource.hive_sync.username': hive_username, 'hoodie.datasource.hive_sync.password': hive_password, 'hoodie.datasource.hive_sync.database': cdm_db, 'hoodie.datasource.hive_sync.table': table_name, 'hoodie.datasource.hive_sync.partition_fields': 'src_db_id', 'hoodie.datasource.hive_sync.enable': True, 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.compact.inline': True, 'hoodie.compact.inline.trigger.strategy': 'NUM_OR_TIME', 'hoodie.compact.inline.max.delta.commits': 1, 'hoodie.compact.inline.max.delta.seconds': 3600 } Application being used- EMR 7.2.0 Spark 3.5.1 Hadoop 3.3.6 The same job is working without any issue with the old EMR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org