Hi Prabhu,

I found that the size of `Full Checkpoint Data Size` is equal to
`Checkpointed Data Size`. So what's the state backend you are using? I
recommend you to use rocksdb state backed for your job, and if so, you can
turn on incremental checkpoint [1] which will reduce the state size for the
checkpoint.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/large_state_tuning/#incremental-checkpoints

Best,
Shammon FY

On Tue, Jun 20, 2023 at 4:50 PM Alex Nitavsky <alexnitav...@gmail.com>
wrote:

> Hello Prabhu,
>
> On your place I would check:
>
> 1. That there is no "state leak" in your job, because it seems that state
> only accumulates for the job and is never cleaned, e.g. probably some timer
> which cleans the state for some key is not configured correctly.
>
> 2. Probably you accumulate the state in a big window, e.g. in a 2 hour
> Tumbling window the maximum job state will be reached in two hours only. So
> your job should be scaled or optimized.
>
> Best
> Alex
>
> On Tue, Jun 20, 2023 at 10:39 AM Prabhu Joseph <prabhujose.ga...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Flink Checkpoint times out with checkpointed data size doubles every
>> checkpoint. Any ideas on what could be wrong in the application or how to
>> debug this?
>>
>> [image: checkpoint_issue.png]
>>
>>
>>

Reply via email to