Hi Jamie!
(and adding Klou)
I think the Streaming FIle Sink has a limit on the number of concurrent
uploads. Could it be that too many uploads enqueue and at some point, the
checkpoint blocks for a long time until that queue is worked off?
Klou, do you have more insights here?
Best,
Stephan
On
Thanks Konstantin,
Refining this a little bit.. All the checkpoints for all the subtasks
upstream of the sink complete in seconds. Most of the subtasks of the sink
itself also complete in seconds other than these very few "slow" ones. So,
somehow we are taking at worst 29 minutes to clear the d
Hi Jamie,
I think, your interpretation is correct. It takes a long time until the
first barrier reaches the "slow" subtask and in case of the screenshot
another 3m 22s until the last barrier reaches the subtask. Regarding the
total amount of data: depending on the your checkpoint configuration
(es
Alright, here's another case where this is very pronounced. Here's a link
to a couple of screenshots showing the overall stats for a slow task as
well as a zoom in on the slowest of them: https://pasteboard.co/IxhGWXz.png
This is the sink stage of a pipeline with 3 upstream tasks. All the
upstr
Here's the second screenshot I forgot to include:
https://pasteboard.co/IxhNIhc.png
On Fri, Sep 13, 2019 at 4:34 PM Jamie Grier wrote:
> Alright, here's another case where this is very pronounced. Here's a link
> to a couple of screenshots showing the overall stats for a slow task as
> well as
Thanks Seth and Stephan,
Yup, I had intended to upload a image. Here it is:
https://pasteboard.co/Ixg0YP2.png
This one is very simple and I suppose can be explained by heavy
backpressure. The more complex version of this problem I run into
frequently is where a single (or a couple of) sub-task(
Hi Jamie!
Did you mean to attach a screenshot? If yes, you need to share that through
a different channel, the mailing list does not support attachments,
unfortunately.
Seth is right how the time is measured.
One important bit to add to the interpretation:
- For non-source tasks, the time inclu
Great timing, I just debugged this on Monday. E2e time is checkpoint
coordinator to checkpoint coordinator, so it includes RPC to the source and RPC
from the operator back for the JM.
Seth
> On Sep 11, 2019, at 6:17 PM, Jamie Grier wrote:
>
> Hey all,
>
> I need to make sense of this behav