prashantwason commented on PR #18089: URL: https://github.com/apache/hudi/pull/18089#issuecomment-4030073724
@nsivabalan Good question. Here are the scenarios where a rollback can start for an ongoing commit: 1. **Heartbeat expiry**: Writer A starts a commit but gets slow (e.g., long GC pause, network issues, slow I/O). Its heartbeat expires. Writer B (or a table service) sees the stale inflight commit and initiates a rollback to clean it up. Meanwhile, Writer A recovers and tries to complete its commit. 2. **Multi-writer setups**: In OCC-based multi-writer scenarios, one writer may decide to rollback another writer's inflight commit (e.g., via lazy rollback of failed instants). If the original writer is still actively working on that commit, both operations can proceed concurrently. 3. **Manual intervention**: An operator manually triggers a rollback of what appears to be a stuck commit, but the writer is actually still making progress. In all these cases, without this PR, the rollback and the commit proceed without detecting the conflict, which can lead to data inconsistency (e.g., the commit completes writing data that the rollback is simultaneously cleaning up). This PR adds conflict detection so that when the writer tries to complete its commit, it detects the concurrent rollback targeting its own commit and fails fast with a `HoodieWriteConflictException`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
