Checkpoints question

Kirill Kosenko Wed, 28 Jul 2021 07:09:50 -0700

Hello


I'm new to Flink. I am playing with Stateful Functions and have a question 
about checkpoints and how they work.


Some configuration details:

state.backend: rocksdb

state.backend.incremental: true

execution.checkpointing.mode: AT_LEAST_ONCE


As far as I know:

1. There is a sync checkpoint phase. I suppose flush of the memtable to the sst 
files during this phase happens and a snapshot is taken after. It appears to be 
a blocking operation.

2. If I got the idea right - snapshot should be sent asynchronously to a 
durable storage right after.

Please confirm my understanding.


I noticed when the checkpoint is triggered there is a delay in messages 
processing. In case without a checkpoint, message processing usually takes less 
than 100ms in case checkpoint triggers.


Message processing usually takes less than 100ms if the checkpoint hasn’t been 
triggered yet. Unfortunately, after the checkpoint is triggered the same 
message processing takes over 2 seconds. This doesn’t match our expectations, 
as we need messages to be processed significantly faster(in real-time), ideally 
less than 1 second during the checkpoint.


>From what I noticed, the larger state we have the longer checkpoint time is 
>required to make a snapshot while using an incremental approach: if the size 
>of the state is about 100MB, then the checkpoint time takes less than 1 
>second. That works for me. However, for 15GB state the checkpoint time takes 
>3-5 seconds. I just want to be sure that state size augments the checkpoint 
>time, despite the fact I use incremental checkpoints.


Please confirm or disprove my understanding.

Is there a way to speed up the checkpoint time or alternatively make the 
checkpoints completely asynchronous?


Thanks

Checkpoints question

Reply via email to