cshuo commented on PR #13530: URL: https://github.com/apache/hudi/pull/13530#issuecomment-3072241488
> > One concern is whether we should avoid introducing new RPC calls as much as possible, since @danny0405 also mentioned there actually exists chance the RPC event sending may fail. > > Can we perform committing of the residual write metadata events in `handleBootstrapEvent` after cleaning legacy events for the restarting case: attemptId > 0. I think it can also solve the problem happened in step 2. > > > Step 2: The task automatically restarted and relaunched. It began writing data based on Commit 2. Since the attempt count was > 0, no recommit operation was performed. > > Agreed. However, in this scenario, if a transmission failure occurs, the pending instant still won't be successfully re-committed, and thus will remain retained in the state. Therefore, this shouldn't pose significant issues. Yes, the fix in this pr also works well even RPC failure occurs, but it makes the writing/committing more complicated. After discussing the @danny0405 offline, we came up with another [fix](https://github.com/apache/hudi/pull/13543) on the writer coordinator side, relying on checkpoint of coordinator to persist the uncommitted write meta events and recommit them during job restoring. It would be great if you can help verify it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
