nsivabalan commented on issue #3739:
URL: https://github.com/apache/hudi/issues/3739#issuecomment-936729727


   let me illustrate w/ an example. 
   archival works with timeline, where as cleaner deals with data files. this 
difference is important to understand the interplays here. 
   
   lets say these are the 3 config values.
   cleaner commits: 4
   keep.min.commits for archival: 5
   keep.min.commits for archival: 10
   
   lets say you starting making commits to hudi. 
   C1, C2, C3, C4.
   When C5 is added, cleaner will clean up all data files pertaining to C1. 
   After this, timeline will still show C1, C2, C3, C4, C5 but data files for 
C1 would have been deleted. 
   
   and then more commits happens.
   C6, C7, C8, C9...
   So cleaner will ensure except last 4 commits, all data files pertaining to 
older commits are cleaned up.
   
   After C10, here is how storage looks like.
   C1, C2 -> C10 in timeline. 
   all data files pertaining to C1, C2,. ... until C6 are cleaned up. rest of 
the commits are intact. 
   
   After C11, archival kicks in. And since we have 11 commits > 
keep.max.commits config value,
   archival will remove C1 to C6 from timeline. basically leave the timeline 
with keep.min.commits (i.e. 5)
   So, here is how the timeline will be after archival 
   C7, C8, C9, C10, C11. 
   And then cleaner kicks in. will clean up data files pertaining to C7.
   
   Hope this clarifies things. Let me know if you need more details. 
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to