Hello
I'm new to Flink. I am playing with Stateful Functions and have a question about checkpoints and how they work. Some configuration details: state.backend: rocksdb state.backend.incremental: true execution.checkpointing.mode: AT_LEAST_ONCE As far as I know: 1. There is a sync checkpoint phase. I suppose flush of the memtable to the sst files during this phase happens and a snapshot is taken after. It appears to be a blocking operation. 2. If I got the idea right - snapshot should be sent asynchronously to a durable storage right after. Please confirm my understanding. I noticed when the checkpoint is triggered there is a delay in messages processing. In case without a checkpoint, message processing usually takes less than 100ms in case checkpoint triggers. Message processing usually takes less than 100ms if the checkpoint hasn’t been triggered yet. Unfortunately, after the checkpoint is triggered the same message processing takes over 2 seconds. This doesn’t match our expectations, as we need messages to be processed significantly faster(in real-time), ideally less than 1 second during the checkpoint. >From what I noticed, the larger state we have the longer checkpoint time is >required to make a snapshot while using an incremental approach: if the size >of the state is about 100MB, then the checkpoint time takes less than 1 >second. That works for me. However, for 15GB state the checkpoint time takes >3-5 seconds. I just want to be sure that state size augments the checkpoint >time, despite the fact I use incremental checkpoints. Please confirm or disprove my understanding. Is there a way to speed up the checkpoint time or alternatively make the checkpoints completely asynchronous? Thanks