parisni opened a new issue, #6373:
URL: https://github.com/apache/hudi/issues/6373

   hudi 0.11.1
   
   I am working on tables with huge number of partition (> 100k) and almost 
append only - no update in the past, rarely delete.
   
   Previously I had some issue with cleaning together with `bulk-insert` : 
auto-clean was very slow because never found previous cleaning commit and also 
always do full cleaning of all partitions.
   
   Now I am using `insert` operation and was expecting no such issue. But I 
also get that behavior: auto-clean always process every partition in the table.
   
   Moreover, cleaning is way slower with metadata enabled (from 5 minutes w/o 
metadata to 4 hours w/ metadata enabled), and it get slower when metadata 
compaction has not been done recently. As a result, auto-clean is not possible 
in my case together with metadata enabled.
   
   By the way, cleaning has multiple functionality such removing old files, but 
also repairing the timeline (eg: timeouted commits).
   
   
   1. Is incremental cleaning supposed to work that way ?
   2. Can full cleaning w/ metadata performances be improved somehow (for 
example use filelisting which is faster)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to