Hi community,

I'm glad that in Flink 1.8.0, it introduced cleanupInRocksdbCompactFilter
to support state clean up for rocksdb backend.
We have an application that heavily relies on managed keyed store.
As we are using rocksdb as the state backend, we were suffering the issue
of ever-growing state size. To be more specific, our checkpoint size grows
into 200GB in 2 weeks.

After upgrade to 1.8.0 and utilize the cleanupInRocksdbCompactFilter ttl
config, the checkpoint size never grows over 10GB.
However, two days after upgrade, checkpointing started to fail because of
the "*Checkpoint expired before completing*".

>From the log, I could not get anything useful.
But in the Flink UI, the last successful checkpoint took 1m to finish, and
our checkpoint timeout is set to 15m.
It seems that the checkpoint period became extremely long all of a sudden.

Is there anyway that I can further look into this? Or is there any
direction that I can tune the ttl for the application?

Thanks in advance!

Best regards,
Mu

Reply via email to