keerthiskating opened a new issue, #11712:
URL: https://github.com/apache/hudi/issues/11712

   **Describe the problem you faced**
   
   My Hudi job runs fine for first 9-10 executions. The job run after hangs and 
neither succeeds or fails. I am running this on Glue 4.0, Hudi 0.14. I have 
gone through the Spark UI and looks like the job is hanging on `Preparing 
compaction metadata: gft_fact_consol_hudi_metadata` step. 
   
   <img width="1485" alt="Screenshot 2024-07-31 at 1 07 15 PM" 
src="https://github.com/user-attachments/assets/f885dc1b-6b18-4afa-93ac-95a25edca287";>
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   Below are the hudi options used
   
   ```
   {
                   'hoodie.table.cdc.enabled':'true',
                   'hoodie.table.cdc.supplemental.logging.mode': 
'data_before_after',
   
                   'hoodie.datasource.write.recordkey.field': 'bazaar_uuid',
                   'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator',
   
                   'hoodie.table.name': "gft_fact_consol_hudi",
                   'hoodie.datasource.write.table.name': "gft_fact_consol_hudi",
                   'hoodie.datasource.hive_sync.table': "gft_fact_consol_hudi",
                   'hoodie.datasource.hive_sync.database': "default",
   
                   'hoodie.datasource.write.partitionpath.field': 'a,b,c',
                   'hoodie.datasource.hive_sync.partition_fields': 'a,b,c',
                   'hoodie.datasource.write.hive_style_partitioning': 'true',
                   'hoodie.datasource.hive_sync.enable': 'true',
                   'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
   
                   'hoodie.metadata.enable': 'true',
                   'hoodie.metadata.record.index.enable':'true',
                   'hoodie.cleaner.policy': 'KEEP_LATEST_FILE_VERSIONS',
   
                   # 'hoodie.parquet.small.file.limit':104857600,
                   # 'hoodie.parquet.max.file.size':125829120,
   
                   'hoodie.clustering.inline':'true',
                   'hoodie.clustering.inline.max.commits': '4',
   
                   'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE',
                   'hoodie.datasource.write.operation': 'upsert',
                   'hoodie.datasource.write.precombine.field': 'record_uuid',
   
                   'hoodie.datasource.hive_sync.use_jdbc': 'false',
                   'hoodie.datasource.hive_sync.mode': 'hms',
                   'hoodie.datasource.hive_sync.support_timestamp': 'true',
   
                   # 'hoodie.write.concurrency.mode': 
'OPTIMISTIC_CONCURRENCY_CONTROL',
                   # 'hoodie.write.lock.provider': 
'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
                   # 'hoodie.cleaner.policy.failed.writes': 'LAZY',
   
                   # 'hoodie.write.lock.dynamodb.table': 'fri_hudi_locks_table',
                   # 'hoodie.embed.timeline.server': 'false',
                   # 'hoodie.write.lock.client.wait_time_ms_between_retry': 
50000,
                   # 'hoodie.write.lock.wait_time_ms_between_retry': 20000,
                   # 'hoodie.write.lock.wait_time_ms': 60000,
                   # 'hoodie.write.lock.client.num_retries': 15,
                   # 'hoodie.keep.max.commits':'7',
                   # 'hoodie.keep.min.commits':'6',
                   # 'hoodie.write.lock.dynamodb.region': 'us-west-2',
                   # 'hoodie.write.lock.dynamodb.endpoint_url': 
'dynamodb.us-west-2.amazonaws.com'
   }
   ```
   
   **Expected behavior**
   
   As per https://hudi.apache.org/docs/compaction#background, compaction should 
only occur for MOR tables. Any idea why it is happening for a COW table?
   
   **Environment Description**
   
   * Hudi version : 0.14
   
   * Spark version : 3.3.0
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) : 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to