[ https://issues.apache.org/jira/browse/HUDI-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-2458: ---------------------------- Story Points: 2 > Relax compaction in metadata being fenced based on inflight requests in data > table > ---------------------------------------------------------------------------------- > > Key: HUDI-2458 > URL: https://issues.apache.org/jira/browse/HUDI-2458 > Project: Apache Hudi > Issue Type: Task > Reporter: sivabalan narayanan > Assignee: Ethan Guo > Priority: Blocker > Fix For: 0.11.0 > > > Relax compaction in metadata being fenced based on inflight requests in data > table. > Compaction in metadata is triggered only if there are no inflight requests in > data table. This might cause liveness problem since for very large > deployments, we could either have compaction or clustering always in > progress. So, we should try to see how we can relax this constraint. > > Proposal to remove this dependency: > With recent addition of spurious deletes config, we can actually get away > with this. > As of now, we have 3 inter linked nuances. > - Compaction in metadata may not kick in, if there are any inflight > operations in data table. > - Rollback when being applied to metadata table has a dependency on last > compaction instant in metadata table. We might even throw exception if > instant being rolledback is < latest metadata compaction instant time. > - Archival in data table is fenced by latest compaction in metadata table. > > So, just incase data timeline has any dangling inflght operation (lets say > someone tried clustering, and killed midway and did not ever attempt again), > metadata compaction will never kick in at all for good. I need to check what > does archival do for such inflight operations in data table though when it > tries to archive near by commits. > > So, with spurious deletes support which we added recently, all these can be > much simplified. > Whenever we want to apply a rollback commit, we don't need to take different > actions based on whether the commit being rolled back is already committed to > metadata table or not. Just go ahead and apply the rollback. Merging of > metadata payload records will take care of this. If the commit was already > synced, final merged payload may not have spurious deletes. If the commit > being rolledback was never committed to metadata, final merged payload may > have some spurious deletes which we can ignore. > With this, compaction in metadata does not need to have any dependency on > inflight operations in data table. > And we can loosen up the dependency of archival in data table on metadata > table compaction as well. > So, in summary, all the 3 dependencies quoted above will be moot if we go > with this approach. Archival in data table does not have any dependency on > metadata table compaction. Rollback when being applied to metadata table does > not care about last metadata table compaction. Compaction in metadata table > can proceed even if there are inflight operations in data table. > > Especially our logic to apply rollback metadata to metadata table will become > a lot simpler and is easy to reason about. > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)