parisni opened a new issue, #6373: URL: https://github.com/apache/hudi/issues/6373
hudi 0.11.1 I am working on tables with huge number of partition (> 100k) and almost append only - no update in the past, rarely delete. Previously I had some issue with cleaning together with `bulk-insert` : auto-clean was very slow because never found previous cleaning commit and also always do full cleaning of all partitions. Now I am using `insert` operation and was expecting no such issue. But I also get that behavior: auto-clean always process every partition in the table. Moreover, cleaning is way slower with metadata enabled (from 5 minutes w/o metadata to 4 hours w/ metadata enabled), and it get slower when metadata compaction has not been done recently. As a result, auto-clean is not possible in my case together with metadata enabled. By the way, cleaning has multiple functionality such removing old files, but also repairing the timeline (eg: timeouted commits). 1. Is incremental cleaning supposed to work that way ? 2. Can full cleaning w/ metadata performances be improved somehow (for example use filelisting which is faster) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
