hbgstc123 opened a new pull request, #9226:
URL: https://github.com/apache/hudi/pull/9226

   …eration when getting the oldest instant to retain for clustering from 
archival.
   
   According to the current logic of 
`ClusteringUtils#getOldestInstantToRetainForClustering`,  
   if the timeline of a hoodie table is `replace1 commit2 clean3`, the 
earliestInstantToRetain of clean3 is commit2, then replace1 is considered ready 
for archival no matter when it is completed. But if replace1 is completed after 
clean3, then the replaced files in replace1 are not cleaned, so it should not 
be archived. This pr fix such case.
   
   ### Change Logs
   
   Add logic to `ClusteringUtils#getOldestInstantToRetainForClustering`, make 
sure a replace commit not archived if its actual complete time is later than 
the actual complete time of the latest completed clean instant.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to