Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-10 Thread German Schiavon
OK got it! Thanks! On Tue, 9 Mar 2021 at 21:17, Jungtaek Lim wrote: > That property decides how many log files (log file is created per batch > per type - types are like offsets, commits, etc.) to retain on the > checkpoint. > > Unless you're struggling with a small files problem on checkpoint

Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread Jungtaek Lim
That property decides how many log files (log file is created per batch per type - types are like offsets, commits, etc.) to retain on the checkpoint. Unless you're struggling with a small files problem on checkpoint, you wouldn't need to tune the value. I guess that's why the configuration is mar

Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread German Schiavon
Hey Maxim, ok! I didn't see them. Is this property documented somewhere? Thanks! On Tue, 9 Mar 2021 at 13:57, Maxim Gekk wrote: > Hi German, > > It is used at least at: > 1. > https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spar

Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread Maxim Gekk
Hi German, It is used at least at: 1. https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56 2. https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/s