parisni commented on issue #6373: URL: https://github.com/apache/hudi/issues/6373#issuecomment-1213305473
for 2. , I guess the reason the cleaning with metadata table is slow is due to filelisting and not partition listing. Filelisting is done on the metadtata table side while partition listing is done on filesystem https://github.com/apache/hudi/blob/6e7ac457352e007939ba3c44c9dc197de7b88ed3/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L310 A way to improve this would be filegroup listing fall back on file system (behavior without metadata table). for 1. from debugging + source code, incremental cleaning occurs only when a cleaning file deletion has happened. Then it only consider following commits. I guess this is both a performance problem (in my case + bulk-insert case) and can leave old partition uncleaned. This is complicated to explain, and... I might be wrong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
