Re: Checkpointing is not performing well

Rafi Aroch Sat, 07 Sep 2019 13:26:07 -0700

Hi Ravi,

Consider moving to RocksDB state backend, where you can enable incremental
checkpointing. This will make you checkpoints size stay pretty much
constant even when your state becomes larger.


https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/state/state_backends.html#the-rocksdbstatebackend


Thanks,
Rafi

On Sat, Sep 7, 2019, 17:47 Ravi Bhushan Ratnakar <
ravibhushanratna...@gmail.com> wrote:

> Hi All,
>
> I am writing a streaming application using Flink 1.9. This application
> consumes data from kinesis stream which is basically avro payload.
> Application is using KeyedProcessFunction to execute business logic on the
> basis of correlation id using event time characteristics with below
> configuration --
> StateBackend - filesystem with S3 storage
> registerTimeTimer duration for each key is  -  currentWatermark  + 15
> seconds
> checkpoint interval - 1min
> minPauseBetweenCheckpointInterval - 1 min
> checkpoint timeout - 10mins
>
> incoming data rate from kinesis -  ~10 to 21GB/min
>
> Number of Task manager - 200 (r4.2xlarge -> 8cpu,61GB)
>
> First 2-4 checkpoints get completed within 1mins where the state size is
> usually 50GB. As the state size grows beyond 50GB, then checkpointing time
> starts taking more than 1mins and it increased till 10 mins and then
> checkpoint fails. The moment the checkpoint starts taking more than 1 mins
> to complete then application starts processing slow and start lagging in
> output.
>
> Any suggestion to fine tune checkpoint performance would be highly
> appreciated.
>
> Regards,
> Ravi
>

Re: Checkpointing is not performing well

Reply via email to