n3nash commented on issue #2098: URL: https://github.com/apache/hudi/issues/2098#issuecomment-697705248
@ShortFinger For COW -> The number of versions to keep is a function of a) how frequently you run the ingestion job which may have updates b) how long running is the consumer of this table. So, if the consumer of this table runs a query lasting for 1 hr, this means you need to keep atleast the version of the file that was generated 1 hr ago since this query might end up reading it. If your job frequency is lets say 15 mins, you need to set the COMMITS_RETAINED to 4 since worst case, you could get an update for the same record key in every 15 mins batch leading to multiple versions of the same parquet file For MOR -> The number of versions to keep is a function of how frequently you run the compaction process as opposed to how frequently you run ingestion, apart from that, the same logic applies here as well. Hope that provides some clarity. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
