kbuci opened a new pull request, #18305: URL: https://github.com/apache/hudi/pull/18305
### Describe the issue this Pull Request addresses Updated compact to start a heartbeat (within a transaction) before attempting to execute a plan. If multiple writers attempt to execute same logcompact plan at same time, only one of them will process (or rollback if the instant is already in inflight) and the rest will fail with an exception (upon seeing a heartbeat has already been started) and will abort. ### Summary and Changelog This change applies the same safety enhancement for compaction https://github.com/apache/hudi/pull/18012 to logcompaction. Note that, unlike compaction, if a logcompaction instant was already inflight then any`logcompact` call to execute the logcompact plan will delete the instant/plan entirely (preventing it from being re-attempted by the ongoing or future writes). This is by-design, since logcompaction plans are intended to be mutable. Because we anyways intend in the future to disallow concurrent writers to attempt rollback of same instant https://github.com/apache/hudi/issues/18050#issuecomment-3955269910 for now we should prevent 2+ writers from concurrently rolling back an inflight logcompact. If we decide in the future to alway guarantee correct behavior of concurrent rollbacks against same instant then we can revisit this and optimize to skip some heartbeating logic if logcompaction instant is already in inflight. ### Impact - logcompaction plan execution will now briefly take a table lock to start a heartbeat - If the heartbeat is already active, then the writer will fail with an exception, similar to compaction for https://github.com/apache/hudi/pull/18012 - Prevent rollback failed writes cleanup from targeting logcompaction plans (since another writer might be executing a logcompaction .requested plan or rolling back a logcompaction .inflight) ### Risk Level low: - When a user creates a write client to execute a `logcompact`, if that call succeeds they are expected to then commit the logcompaction instant with the same client. Given this assumption, we should only see occurrences of a logcompact execution/commit failing with a transient `due to heartbeat by concurrent writer/job` error in an edge case where a writer job fatally crashed before cleaning up its heartbeat and the next writer job re-attempts to execute/rollback the logcompaction plan before the heartbeat's automatic expiry window is reached. - An additional DFS listing call is performed when executing a logcompaction plan (to account for concurrent writers attempting to execute the same plan) ### Documentation Update None ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
