hbgstc123 opened a new pull request, #9226: URL: https://github.com/apache/hudi/pull/9226
…eration when getting the oldest instant to retain for clustering from archival. According to the current logic of `ClusteringUtils#getOldestInstantToRetainForClustering`, if the timeline of a hoodie table is `replace1 commit2 clean3`, the earliestInstantToRetain of clean3 is commit2, then replace1 is considered ready for archival no matter when it is completed. But if replace1 is completed after clean3, then the replaced files in replace1 are not cleaned, so it should not be archived. This pr fix such case. ### Change Logs Add logic to `ClusteringUtils#getOldestInstantToRetainForClustering`, make sure a replace commit not archived if its actual complete time is later than the actual complete time of the latest completed clean instant. ### Impact none ### Risk level (write none, low medium or high below) low ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
