Hi,
Yes, for example [1]. Most of the points that you mentioned are already visible
in the UI and/or via metrics, just take a look at the subtask checkpoint stats.
> when barriers were instrumented at source from checkpoint coordinator
That’s checkpoint trigger time.
> when each down stream task
Just echo what Lu mentioned, is there documentation we can find more info on
* when barriers were instrumented at source from checkpoint coordinator
* when each down stream task observe first barrier of a chk
* when list of barriers of a chk arrives to a task
* when snapshot start/complete
*
Hi
If the bottleneck is the upload part, did you even have tried upload files
using multithread[1]
[1] https://issues.apache.org/jira/browse/FLINK-11008
Best,
Congxian
Lu Niu 于2020年4月24日周五 下午12:38写道:
> Hi, Robert
>
> Thanks for relying. Yeah. After I added monitoring on the above path, it
> sh
Hi, Robert
Thanks for relying. Yeah. After I added monitoring on the above path, it
shows the slowness did come from uploading file to s3. Right now I am still
investigating the issue. At the same time, I am trying PrestoS3FileSystem
to check whether that can mitigate the problem.
Best
Lu
On Thu
Hi Lu,
were you able to resolve the issue with the slow async checkpoints?
I've added Yu Li to this thread. He has more experience with the state
backends to decide which monitoring is appropriate for such situations.
Best,
Robert
On Tue, Apr 21, 2020 at 10:50 PM Lu Niu wrote:
> Hi, Robert
>
Hi, Robert
Thanks for replying. To improve observability , do you think we should
expose more metrics in checkpointing? for example, in incremental
checkpoint, the time spend on uploading sst files?
https://github.com/apache/flink/blob/5b71c7f2fe36c760924848295a8090898cb10f15/flink-state-backends/
Hi,
did you check the TaskManager logs if there are retries by the s3a file
system during checkpointing?
I'm not aware of any metrics in Flink that could be helpful in this
situation.
Best,
Robert
On Tue, Apr 14, 2020 at 12:02 AM Lu Niu wrote:
> Hi, Flink users
>
> We notice sometimes async ch