Hi Arvid, Thank you for the clarification!
Best, Eleanore On Tue, Mar 10, 2020 at 12:32 PM Arvid Heise <ar...@ververica.com> wrote: > Hi Eleanore, > > incremental checkpointing would be needed if you have a large state > (GB-TB), but between two checkpoints only little changes happen (KB-MB). > > There are two reasons for large state: large user state or large operator > state coming from joins, windows, or grouping. In the end, you will see the > total size in the web ui. If it's small and checkpointing duration is low, > there is absolutely no way to go incremental. > > On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin <eleanore....@gmail.com> > wrote: > >> Hi All, >> >> I am using Apache Beam to construct the pipeline, and this pipeline is >> running with Flink Runner. >> >> Both Source and Sink are Kafka topics, I have enabled Beam Exactly once >> semantics. >> >> I believe how it works in beam is: >> the messages will be cached and not processed by the >> KafkaExactlyOnceSink, until the checkpoint completes and all the cached >> messages are checkpointed, then it will start processing those messages. >> >> So is there any benefit to enable increment checkpointing when using >> RocksDB as backend. Because I see the states as consumer offsets, and >> cached messages in between checkpoints. Delta seems to be the complete new >> checkpointed states. >> >> Thanks a lot! >> Eleanore >> >