codope commented on issue #4242:
URL: https://github.com/apache/hudi/issues/4242#issuecomment-991089546


   @Rap70r These two blogs should help in understanding clustering in Hudi:
   * [Clustering 
intro](https://hudi.apache.org/blog/2021/01/27/hudi-clustering-intro/)
   * [Async 
clustering](https://hudi.apache.org/blog/2021/08/23/async-clustering)
   
   Specifically, to cluster the small files, the default value for clustering 
plan and execution strategy configs should be good. However, below configs 
might need some tuning specific to dataset:
   ```
   hoodie.clustering.plan.strategy.small.file.limit
   hoodie.clustering.plan.strategy.target.file.max.bytes
   ```
   The first config determines which input file groups (files smaller than the 
size specified by this config) are eligible for clustering. The second config 
determines the clustered file size. You can read more about clustering configs 
[here](https://hudi.apache.org/docs/configurations#Clustering-Configs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to