[ https://issues.apache.org/jira/browse/HUDI-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo closed HUDI-3604. --------------------------- Resolution: Fixed > Missing to apply rollback commits to Metadata table if rollback failed mid-way > ------------------------------------------------------------------------------ > > Key: HUDI-3604 > URL: https://issues.apache.org/jira/browse/HUDI-3604 > Project: Apache Hudi > Issue Type: Bug > Components: metadata > Reporter: sivabalan narayanan > Assignee: Ethan Guo > Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > C1, C2, C3. C4 (RB_C1) in progress. > When C4 (i.e. RB of C1 is triggered), after deleting data files, and after > deleting the commits files in timeline (C1), let's say the process crashed > (before applying to MDT). > Even if the user restarts the pipeline, there won't be any pending failed > commits(i.e. C1) to rollback and new commit will continue. w/o worrying about > C4. But metadata table will miss out this rollback commit. > > Proposal: > We need two fixes atleast: > a. We should clean the C1 commit files from data table timeline only after > applying the rollback commit to MDT. This way we will ensure no commit files > in data table will be cleaned up before applying the rollback to MDT. > b. Whenever we check for failed commits to rollback, we should also check for > any dangling rollback to be re-attempted. This again needs some fixes in > rollback executor as well. since chances that the commit to rollback may not > exist in data table timeline at all. but we need to re-attempt the rollback > and get it to completion(so that we let metadata make progress wrt > compactions). It's not easy to detect a pending rollback from a dangling > rollback. So, can't think of ways to detect dangling rollback just by looking > at data table active timeline. hence had to re-attempt any pending rollback > instants and get it to completion. > > Dangling rollbacks: > Following up on above eg: > C1, C2, C3, C4(RB_C1) failed mid-way. But the crash happens after deleting > the datafiles and deleting commit files in data timeline. So, lets say the > process crashes as of now (before applying to MDT). If the user restarts the > pipeline, hudi will check for partially failed commits to trigger rollback. > But since C1 is deleted from the timeline by C4(RB_C1), rollback of C1 will > not kick in. So, C4 i.e RB_C1 will just stay in the timeline forever since > there is no other trigger that can take it to completion or delete it. -- This message was sent by Atlassian Jira (v8.20.1#820001)