[I] [SUPPORT] AWS Athena query fails with HIVE_UNKNOWN_ERROR on MOR table with Hudi 1.0.0 [hudi]

via GitHub Thu, 30 Jan 2025 23:45:15 -0800


nomupay-vikkikumar opened a new issue, #12750:
URL: https://github.com/apache/hudi/issues/12750


   I am creating hudi **MOR** table with **NBCC** concurrency mode using hudi 
1.0.0 with spark 3.5.0. Hudi table getting created without any error and even 
it's getting accessible through spark. But when i am querying on this hudi 
table on AWS Athena, it's giving HIVE_UNKNOWN_ERROR.
   
   I'm using S3 jar for hudi-aws-bundle, hudi-spark3.5-bundle and 
hudi-utilities-bundle which is build on top of hudi 1.0.0.
   
   Here's spark hudi option i'm using (Python)
   
   ```
   hudi_option = {
       'hoodie.table.name': 'my_hudi_mor_table',
       'hoodie.datasource.write.recordkey.field': "table_record_id",
       'hoodie.datasource.write.partitionpath.field': "partition_field",
       'hoodie.write.concurrency.mode': 'NON_BLOCKING_CONCURRENCY_CONTROL',
       'hoodie.table.type': 'MERGE_ON_READ',
       'hoodie.datasource.write.storage.type': 'MERGE_ON_READ',
       'hoodie.datasource.query.type': 'MERGE_ON_READ',
       'hoodie.datasource.write.operation': 'upsert',
       'hoodie.datasource.write.precombine.field': 'table_record_id',
       'hoodie.upsert.shuffle.parallelism': 300,
       'hoodie.insert.shuffle.parallelism': 300,
       'hoodie.delete.shuffle.parallelism': 300,
       'hoodie.index.type': "BUCKET",
       'hoodie.record.index.update.partition.path': "true",
       'hoodie.metadata.enable': "true",
       'hoodie.metadata.record.index.enable': "false",
       'hoodie.metadata.index.column.stats.enable': "false",
       
       # Hive sync settings
       'hoodie.datasource.hive_sync.database': 'my_db',
       'hoodie.datasource.hive_sync.table': 'my_hudi_mor_table',
       'hoodie.datasource.hive_sync.partition_fields': 'partition_field',
       'hoodie.datasource.write.hive_style_partitioning': 'true',
       'hoodie.datasource.hive_sync.enable': 'true',
       
'hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled': 
'true',
       'hoodie.datasource.hive_sync.support_timestamp': 'true'
   }
   ```
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. run spark application to create hudi MOR table with above hudi_config
   2. run athena query to fetch data from this hudi table
   
   **Expected behavior**
   
   Should be able to query on this hudi MOR table using AWS Athena without any 
HIVE_UNKNOWN_ERROR.
   
   **Environment Description**
   
   * Hudi version : 1.0.0
   
   * Spark version : 3.5.0
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.6
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] AWS Athena query fails with HIVE_UNKNOWN_ERROR on MOR table with Hudi 1.0.0 [hudi]

Reply via email to