pratyakshsharma commented on a change in pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#discussion_r801446438



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -76,6 +76,11 @@
       .withDocumentation("Number of commits to retain, without cleaning. This 
will be retained for num_of_commits * time_between_commits "
           + "(scheduled). This also directly translates into how much data 
retention the table supports for incremental queries.");
 
+  public static final ConfigProperty<String> CLEANER_HOURS_RETAINED = 
ConfigProperty.key("hoodie.cleaner.hours.retained")
+          .defaultValue("24")
+          .withDocumentation("Number of hours for which commits need to be 
retained. This config provides a more flexible option as"
+          + "compared to number of commits retained for cleaning service");

Review comment:
       No actually `KEEP_LATEST_BY_HOURS` is supposed to simplify the use of 
`KEEP_LATEST_COMMITS`. The latter works on the firm assumption that ingestion 
is happening at regular intervals (say 30 minutes). In practical world, this is 
not always the case. 
   
   Both the policies work to enable the longest running query to succeed. 
Basically the end goal is to retain the files generated X hours ago. 
`KEEP_LATEST_BY_HOURS` makes it easier for users to directly configure this X 
variable rather than counting the number of commits to ultimately cover this 
duration of X hours. 
   
   To summarise, configuring `hoodie.cleaner.hours.retained` property is enough 
for users to achieve their end goal. Regarding adding more documentation, I am 
planning to update the [cleaner 
blog](https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner)
 with this new policy along with illustrations, once this PR lands. That would 
make concepts clearer for the end users. 
   
   LMK if you have any questions. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to