Could you share the logs to check possible failures to subsume or remove 
previous checkpoints?
What is the sizes of the files? It can help to understand how compaction goes.
Could you also provide more details how you setup TtlDb with Flink?

Best,
Andrey

> On 29 Nov 2018, at 11:34, Andrey Zagrebin <and...@data-artisans.com> wrote:
> 
> Compaction merges SST files in background using native threads. While merging 
> it filters out removed and expired data. In general, the idea is that there 
> are enough resources for compaction to keep up with the DB update rate and 
> reduce storage. It can be quite IO intensive. Compaction has a lot of tuning 
> knobs and statistics to monitor the process [1] which are usually out of the 
> scope of Flink depending on state access pattern of the application. You can 
> create and set RocksDBStateBackend for you application in Flink and configure 
> it with custom RocksDb/column specific options.
> 
> [1] https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide 
> <https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide>
> [2] https://github.com/facebook/rocksdb/wiki/Compaction 
> <https://github.com/facebook/rocksdb/wiki/Compaction>
> 
>> On 29 Nov 2018, at 11:20, <bernd.winterst...@dev.helaba.de 
>> <mailto:bernd.winterst...@dev.helaba.de>> <bernd.winterst...@dev.helaba.de 
>> <mailto:bernd.winterst...@dev.helaba.de>> wrote:
>> 
>> We use TtlDB because the state contents should expire automatically after 24 
>> hours. Therefore we only changed the state backend to use TtlDb instead of 
>> RocksDB with a fixed retention time.
>> 
>> We have a slow IO because we only have SAN volumes available. Can you 
>> further clarify the problem with slow compaction.
>> 
>> Regards,
>> 
>> Bernd
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Andrey Zagrebin [mailto:and...@data-artisans.com 
>> <mailto:and...@data-artisans.com>]
>> Gesendet: Donnerstag, 29. November 2018 11:01
>> An: Winterstein, Bernd
>> Cc: Kostas Kloudas; user; s.rich...@data-artisans.com 
>> <mailto:s.rich...@data-artisans.com>; t...@data-artisans.com 
>> <mailto:t...@data-artisans.com>; step...@data-artisans.com 
>> <mailto:step...@data-artisans.com>
>> Betreff: Re: number of files in checkpoint directory grows endlessly
>> 
>> If you use incremental checkpoints, state backend stores raw RocksDB SST 
>> files which represent all state data. Each checkpoint adds SST files with 
>> new updates which are not present in previous checkpoint, basically their 
>> difference.
>> 
>> One of the following could be happening:
>> - old keys are not explicitly deleted or expire (depending on how TtlDb is 
>> used)
>> - compaction is too slow to drop older SST files for the latest checkpoint 
>> so that they can be deleted with the previous checkpoints
>> 
>>> On 29 Nov 2018, at 10:48, <bernd.winterst...@dev.helaba.de 
>>> <mailto:bernd.winterst...@dev.helaba.de>> <bernd.winterst...@dev.helaba.de 
>>> <mailto:bernd.winterst...@dev.helaba.de>> wrote:
>>> 
>>> Hi
>>> We use Flink 1..6.2. As for the checkpoint directory there is only one 
>>> chk-xxx directory. Therefore if would expect only one checkpoint remains.
>>> The value of 'state.checkpoints.num-retained’ is not set explicitly.
>>> 
>>> The problem is not the number of checkpoints but the number of files in the 
>>> "shared" directory next to the chk-xxx directory.
>>> 
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Andrey Zagrebin [mailto:and...@data-artisans.com 
>>> <mailto:and...@data-artisans.com>]
>>> Gesendet: Donnerstag, 29. November 2018 10:39
>>> An: Kostas Kloudas
>>> Cc: Winterstein, Bernd; user; Stefan Richter; Till Rohrmann; Stephan
>>> Ewen
>>> Betreff: Re: number of files in checkpoint directory grows endlessly
>>> 
>>> Hi Bernd,
>>> 
>>> Did you change 'state.checkpoints.num-retained’ in flink-conf.yaml? By 
>>> default, only one checkpoint should be retained.
>>> 
>>> Which version of Flink do you use?
>>> Can you check Job Master logs whether you see there warning like this:
>>> `Fail to subsume the old checkpoint`?
>>> 
>>> Best,
>>> Andrey
>>> 
>>>> On 29 Nov 2018, at 10:18, Kostas Kloudas <k.klou...@data-artisans.com 
>>>> <mailto:k.klou...@data-artisans.com>> wrote:
>>>> 
>>>> Hi Bernd,
>>>> 
>>>> I think the Till, Stefan or Stephan (cc'ed) are the best to answer your 
>>>> question.
>>>> 
>>>> Cheers,
>>>> Kostas
>>> 
>>> ________________________________
>>> 
>>> 
>>> Landesbank Hessen-Thueringen Girozentrale Anstalt des oeffentlichen
>>> Rechts
>>> Sitz: Frankfurt am Main / Erfurt
>>> Amtsgericht Frankfurt am Main, HRA 29821 / Amtsgericht Jena, HRA
>>> 102181
>>> 
>>> Bitte nutzen Sie die E-Mail-Verbindung mit uns ausschliesslich zum 
>>> Informationsaustausch. Wir koennen auf diesem Wege keine 
>>> rechtsgeschaeftlichen Erklaerungen (Auftraege etc.) entgegennehmen.
>>> 
>>> Der Inhalt dieser Nachricht ist vertraulich und nur fuer den angegebenen 
>>> Empfaenger bestimmt. Jede Form der Kenntnisnahme oder Weitergabe durch 
>>> Dritte ist unzulaessig. Sollte diese Nachricht nicht fur Sie bestimmt sein, 
>>> so bitten wir Sie, sich mit uns per E-Mail oder telefonisch in Verbindung 
>>> zu setzen.
>>> 
>>> Please use your E-mail connection with us exclusively for the exchange of 
>>> information. We do not accept legally binding declarations (orders, etc.) 
>>> by this means of communication.
>>> 
>>> The contents of this message is confidential and intended only for the
>>> recipient indicated. Taking notice of this message or disclosure by third 
>>> parties is not permitted. In the event that this message is not intended 
>>> for you, please contact us via E-mail or phone.
>> 
>> ________________________________
>> 
>> 
>> Landesbank Hessen-Thueringen Girozentrale
>> Anstalt des oeffentlichen Rechts
>> Sitz: Frankfurt am Main / Erfurt
>> Amtsgericht Frankfurt am Main, HRA 29821 / Amtsgericht Jena, HRA 102181
>> 
>> Bitte nutzen Sie die E-Mail-Verbindung mit uns ausschliesslich zum 
>> Informationsaustausch. Wir koennen auf diesem Wege keine 
>> rechtsgeschaeftlichen Erklaerungen (Auftraege etc.) entgegennehmen.
>> 
>> Der Inhalt dieser Nachricht ist vertraulich und nur fuer den angegebenen 
>> Empfaenger bestimmt. Jede Form der Kenntnisnahme oder Weitergabe durch 
>> Dritte ist unzulaessig. Sollte diese Nachricht nicht fur Sie bestimmt sein, 
>> so bitten wir Sie, sich mit uns per E-Mail oder telefonisch in Verbindung zu 
>> setzen.
>> 
>> Please use your E-mail connection with us exclusively for the exchange of 
>> information. We do not accept legally binding declarations (orders, etc.) by 
>> this means of communication.
>> 
>> The contents of this message is confidential and intended only for the 
>> recipient indicated. Taking notice of this message or disclosure by third 
>> parties is not
>> permitted. In the event that this message is not intended for you, please 
>> contact us via E-mail or phone.
> 

Reply via email to