Yes. I do use RocksDB for (incremental) checkpointing. During each checkpoint 15-20GB of state gets created (new state added, some expired). I make use of FIFO compaction.
I’m a bit surprised you were able to run with 10+TB state without unaligned checkpoints because the performance in my application degrades quite a lot. Can you share your checkpoint configurations? Thanks, Vishal On 15 Nov 2022, 10:07 PM +0530, Yaroslav Tkachenko <yaros...@goldsky.com>, wrote: > Hi Vishal, > > Just wanted to comment on this bit: > > > My job has very large amount of state (>100GB) and I have no option but to > > use unaligned checkpoints. > > I successfully ran Flink jobs with 10+ TB of state and no unaligned > checkpoints enabled. Usually, you consider enabling them when there is some > kind of skew in the topology, but IMO it's unrelated to the state size. > > > Reducing the checkpoint interval is not really an option given the size of > > the checkpoint > > Do you use RocksDB state backend with incremental checkpointing? > > > On Tue, Nov 15, 2022 at 12:07 AM Vishal Surana <vis...@moengage.com> wrote: > > > I wanted to achieve exactly once semantics in my job and wanted to make > > > sure I understood the current behaviour correctly: > > > > > > 1. Only one Kafka transaction at a time (no concurrent checkpoints) > > > 2. Only one transaction per checkpoint > > > > > > > > > My job has very large amount of state (>100GB) and I have no option but > > > to use unaligned checkpoints. With the above limitation, it seems to me > > > that if checkpoint interval is 1 minute and checkpoint takes about 10 > > > seconds to complete then only one Kafka transaction can happen in 70 > > > seconds. All of the output records will not be visible until the > > > transaction completes. This way a steady stream of inputs will result in > > > an buffered output stream where data is only visible after a minute, > > > thereby destroying any sort of real time streaming use cases. Reducing > > > the checkpoint interval is not really an option given the size of the > > > checkpoint. Only way out would be to allow for multiple transactions per > > > checkpoint. > > > > > > Thanks, > > > Vishal