codope commented on issue #4242: URL: https://github.com/apache/hudi/issues/4242#issuecomment-991089546
@Rap70r These two blogs should help in understanding clustering in Hudi: * [Clustering intro](https://hudi.apache.org/blog/2021/01/27/hudi-clustering-intro/) * [Async clustering](https://hudi.apache.org/blog/2021/08/23/async-clustering) Specifically, to cluster the small files, the default value for clustering plan and execution strategy configs should be good. However, below configs might need some tuning specific to dataset: ``` hoodie.clustering.plan.strategy.small.file.limit hoodie.clustering.plan.strategy.target.file.max.bytes ``` The first config determines which input file groups (files smaller than the size specified by this config) are eligible for clustering. The second config determines the clustered file size. You can read more about clustering configs [here](https://hudi.apache.org/docs/configurations#Clustering-Configs). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org