Hi all!

I would like to open up for discussion a new FLIP-483 [1].
Motivation
FLIP-384 [2] added trace/span reporting capability to Flink, which has been
used in a couple of places, like reporting checkpointing and recovery
processes.

With flat/childless structure of spans it is difficult to accurately report
checkpointing or recovery. Single top level span for checkpointing or
recovery is currently aggregating some metrics, like maximum and sum of how
long did the state download/upload take. However this hides some details,
like how long each task and/or subtask was downloading the state.

In this FLIP we want to introduce a general mechanism for reporting
children spans.

For more information please look into the FLIP-483 [1].

I'm looking forward to your thoughts on this.

Best,
Piotrek

[1] https://cwiki.apache.org/confluence/x/4IyMEw
[2] https://cwiki.apache.org/confluence/x/TguZE

Reply via email to