[
https://issues.apache.org/jira/browse/HUDI-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenning Ding updated HUDI-4278:
-------------------------------
Description:
The issue is each time when Hudi upserts records, it would sync to the catalog
and update {{last_commit_time_sync}} for the Glue table. Each time it updates
this property, Glue by default would create a new table version and archive old
versions. So the problem is if customers update the Hudi table frequently,
eventually they would hit the Glue table version limit.
So here inside Hudi, we pass a parameter {{skipGlueArchive}} to the environment
context to finally pass it to {{{}AWS Glue metadata service{}}}, so Glue client
has an option to decide whether to skip archive or not.
> Add skip archive option when syncing to AWS Glue tables
> -------------------------------------------------------
>
> Key: HUDI-4278
> URL: https://issues.apache.org/jira/browse/HUDI-4278
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Wenning Ding
> Priority: Major
>
> The issue is each time when Hudi upserts records, it would sync to the
> catalog and update {{last_commit_time_sync}} for the Glue table. Each time it
> updates this property, Glue by default would create a new table version and
> archive old versions. So the problem is if customers update the Hudi table
> frequently, eventually they would hit the Glue table version limit.
> So here inside Hudi, we pass a parameter {{skipGlueArchive}} to the
> environment context to finally pass it to {{{}AWS Glue metadata service{}}},
> so Glue client has an option to decide whether to skip archive or not.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)