kbuci opened a new pull request, #18305:
URL: https://github.com/apache/hudi/pull/18305

   ### Describe the issue this Pull Request addresses
   
   Updated compact to start a heartbeat (within a transaction) before 
attempting to execute a plan.
   
   
   If multiple writers attempt to execute same logcompact plan at same time, 
only one of them will process (or rollback if the instant is already in 
inflight) and the rest will fail with an exception (upon seeing a heartbeat has 
already been started) and will abort.
   
   ### Summary and Changelog
   
   This change applies the same safety enhancement for compaction 
https://github.com/apache/hudi/pull/18012 to logcompaction. 
   
   Note that, unlike compaction, if a logcompaction instant was already 
inflight then any`logcompact` call to execute the logcompact plan will delete 
the instant/plan entirely (preventing it from being re-attempted by the ongoing 
or future writes). This is by-design, since logcompaction plans are intended to 
be mutable. Because we anyways intend in the future to disallow concurrent 
writers to attempt rollback of same instant 
https://github.com/apache/hudi/issues/18050#issuecomment-3955269910 for now we 
should prevent 2+ writers from concurrently rolling back an inflight 
logcompact. If we decide in the future to alway guarantee correct behavior of 
concurrent rollbacks against same instant then we can revisit this and optimize 
to skip some heartbeating logic if logcompaction instant is already in inflight.
   
   
   ### Impact
   
   - logcompaction plan execution will now briefly take a table lock to start a 
heartbeat
   - If the heartbeat is already active, then the writer will fail with an 
exception, similar to compaction for https://github.com/apache/hudi/pull/18012 
   - Prevent rollback failed writes cleanup from targeting logcompaction plans 
(since another writer might be executing a logcompaction .requested plan or 
rolling back a logcompaction .inflight)
   
   
   
   ### Risk Level
   
   low: 
   - When a user creates a write client to execute a `logcompact`, if that call 
succeeds they are expected to then commit the logcompaction instant with the 
same client. Given this assumption, we should only see occurrences of a 
logcompact execution/commit failing with a transient `due to heartbeat by 
concurrent writer/job` error in an edge case where a writer job fatally crashed 
before cleaning up its heartbeat and the next writer job re-attempts to 
execute/rollback the logcompaction plan before the heartbeat's automatic expiry 
window is reached.
   - An additional DFS listing call is performed when executing a logcompaction 
plan (to account for concurrent writers attempting to execute the same plan)
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to