Hi Prabhu, I found that the size of `Full Checkpoint Data Size` is equal to `Checkpointed Data Size`. So what's the state backend you are using? I recommend you to use rocksdb state backed for your job, and if so, you can turn on incremental checkpoint [1] which will reduce the state size for the checkpoint.
[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/large_state_tuning/#incremental-checkpoints Best, Shammon FY On Tue, Jun 20, 2023 at 4:50 PM Alex Nitavsky <alexnitav...@gmail.com> wrote: > Hello Prabhu, > > On your place I would check: > > 1. That there is no "state leak" in your job, because it seems that state > only accumulates for the job and is never cleaned, e.g. probably some timer > which cleans the state for some key is not configured correctly. > > 2. Probably you accumulate the state in a big window, e.g. in a 2 hour > Tumbling window the maximum job state will be reached in two hours only. So > your job should be scaled or optimized. > > Best > Alex > > On Tue, Jun 20, 2023 at 10:39 AM Prabhu Joseph <prabhujose.ga...@gmail.com> > wrote: > >> Hi, >> >> Flink Checkpoint times out with checkpointed data size doubles every >> checkpoint. Any ideas on what could be wrong in the application or how to >> debug this? >> >> [image: checkpoint_issue.png] >> >> >>