Re: Checkpoints timing out for no apparent reason

Congxian Qiu Wed, 17 Jul 2019 23:17:14 -0700

Hi

The image did not show. incremental checkpoint includes: 1) flush memtable
to sst files; 2) checkpoint of RocksDB; 3) snapshot metadata; 4) upload
needed sst files to remote, all the first three steps are in sync part, and
the fourth step in async part, could you please check whether the sync or
async part takes too long time. As the sync part, maybe you could
checkpoint the disk performance during checkpoint, as the async part, maybe
you should checkpoint the network performance and the s3 client.


Best,
Congxian


spoganshev <s.pogans...@slice.com> 于2019年7月17日周三 上午4:02写道：

> We have an issue with a job when it occasionally times out while creating
> snapshots for no apparent reason:
>
> </file/t1761/checkpoints-issue.png>
>
> Details:
> - Flink 1.7.2
> - Checkpoints are saved to S3 with presto
> - Incremental checkpoints are used
>
> What might be the cause of this issue? It feels like some internal s3
> client
> timeout issue, but I didn't find any configuration of such timeout.
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Checkpoints timing out for no apparent reason

Reply via email to