pratyakshsharma commented on a change in pull request #3646: URL: https://github.com/apache/hudi/pull/3646#discussion_r801446438
########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java ########## @@ -76,6 +76,11 @@ .withDocumentation("Number of commits to retain, without cleaning. This will be retained for num_of_commits * time_between_commits " + "(scheduled). This also directly translates into how much data retention the table supports for incremental queries."); + public static final ConfigProperty<String> CLEANER_HOURS_RETAINED = ConfigProperty.key("hoodie.cleaner.hours.retained") + .defaultValue("24") + .withDocumentation("Number of hours for which commits need to be retained. This config provides a more flexible option as" + + "compared to number of commits retained for cleaning service"); Review comment: No actually `KEEP_LATEST_BY_HOURS` is supposed to simplify the use of `KEEP_LATEST_COMMITS`. The latter works on the firm assumption that ingestion is happening at regular intervals (say 30 minutes). In practical world, this is not always the case. Both the policies work to enable the longest running query to succeed. Basically the end goal is to retain the files generated X hours ago. `KEEP_LATEST_BY_HOURS` makes it easier for users to directly configure this X variable rather than counting the number of commits to ultimately cover this duration of X hours. To summarise, configuring `hoodie.cleaner.hours.retained` property is enough for users to achieve their end goal. Regarding adding more documentation, I am planning to update the [cleaner blog](https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner) with this new policy along with illustrations, once this PR lands. That would make concepts clearer for the end users. LMK if you have any questions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org