That property decides how many log files (log file is created per batch per
type - types are like offsets, commits, etc.) to retain on the checkpoint.

Unless you're struggling with a small files problem on checkpoint, you
wouldn't need to tune the value. I guess that's why the configuration is
marked as "internal" meaning just some admins need to know about such
configuration.

On Wed, Mar 10, 2021 at 3:58 AM German Schiavon <gschiavonsp...@gmail.com>
wrote:

> Hey Maxim,
>
> ok! I didn't see them.
>
> Is this property documented somewhere?
>
> Thanks!
>
> On Tue, 9 Mar 2021 at 13:57, Maxim Gekk <maxim.g...@databricks.com> wrote:
>
>> Hi German,
>>
>> It is used at least at:
>> 1.
>> https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56
>> 2.
>> https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon <gschiavonsp...@gmail.com>
>> wrote:
>>
>>> Hello all,
>>>
>>> I wanted to ask if this property is still active? I can't find it in the
>>> doc https://spark.apache.org/docs/latest/configuration.html or anywhere
>>> in the code(only in Tests).
>>>
>>> If so, should we remove it?
>>>
>>> val MIN_BATCHES_TO_RETAIN = 
>>> buildConf("spark.sql.streaming.minBatchesToRetain")
>>>   .internal()
>>>   .doc("The minimum number of batches that must be retained and made 
>>> recoverable.")
>>>   .version("2.1.1")
>>>   .intConf
>>>   .createWithDefault(100)
>>>
>>>

Reply via email to