n3nash commented on issue #2098:
URL: https://github.com/apache/hudi/issues/2098#issuecomment-697705248


   @ShortFinger For COW -> The number of versions to keep is a function of a) 
how frequently you run the ingestion job which may have updates b) how long 
running is the consumer of this table. So, if the consumer of this table runs a 
query lasting  for 1 hr, this means you need to keep atleast the version of the 
file that was generated 1 hr ago since this query might end up reading it. 
   If your job frequency is lets say 15 mins, you need to set the 
COMMITS_RETAINED to 4 since worst case, you could get an update for the same 
record key in every 15 mins batch leading to multiple versions of the same 
parquet file
   
   For MOR -> The number of versions to keep is a function of how frequently 
you run the compaction process as opposed to how frequently you run ingestion, 
apart from that, the same logic applies here as well.
   
   Hope that provides some clarity.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to