bithw1 opened a new issue #2299:
URL: https://github.com/apache/hudi/issues/2299


   Hi,
   I would like to know the real functionality that hudi cleaner does. 
   
   In my opinion,  there may be two choices a cleaner could provide per user's 
business usage.
   
   1. Delete old commits and also the data,  if a cleaner works in this way, 
then the historic data belonging to these commits will also be deleted. It 
could be useful if historic data is no use to end user's business and possibly 
speed up read/write since there are fewer commits/data there.
   
   2. Merge the old commits into a new commit, also merge the data belonging to 
the old commits into new commit(like Spark's RDD checkpoint to cut off the long 
lineage). If a cleaner works in this way, then end user  could keep the 
historic data, and since there fewer commits there, incremental read between 
commits will be speed up.
   
   I want to know how hudi cleaner works ,thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to