That property decides how many log files (log file is created per batch per type - types are like offsets, commits, etc.) to retain on the checkpoint.
Unless you're struggling with a small files problem on checkpoint, you wouldn't need to tune the value. I guess that's why the configuration is marked as "internal" meaning just some admins need to know about such configuration. On Wed, Mar 10, 2021 at 3:58 AM German Schiavon <gschiavonsp...@gmail.com> wrote: > Hey Maxim, > > ok! I didn't see them. > > Is this property documented somewhere? > > Thanks! > > On Tue, 9 Mar 2021 at 13:57, Maxim Gekk <maxim.g...@databricks.com> wrote: > >> Hi German, >> >> It is used at least at: >> 1. >> https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56 >> 2. >> https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84 >> >> Maxim Gekk >> >> Software Engineer >> >> Databricks, Inc. >> >> >> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon <gschiavonsp...@gmail.com> >> wrote: >> >>> Hello all, >>> >>> I wanted to ask if this property is still active? I can't find it in the >>> doc https://spark.apache.org/docs/latest/configuration.html or anywhere >>> in the code(only in Tests). >>> >>> If so, should we remove it? >>> >>> val MIN_BATCHES_TO_RETAIN = >>> buildConf("spark.sql.streaming.minBatchesToRetain") >>> .internal() >>> .doc("The minimum number of batches that must be retained and made >>> recoverable.") >>> .version("2.1.1") >>> .intConf >>> .createWithDefault(100) >>> >>>