Backpressure is typically caused by something like one of these things:

* problems relating to i/o to external services (e.g., enrichment via an
API or database lookup, or a sink)
* data skew (e.g., a hot key)
* under-provisioning, or competition for resources
* spikes in traffic
* timer storms

I would start to debug this by looking for signs of significant
asymmetry in the metrics (across the various subtasks), or resource
exhaustion. Could be related to the network, GC, CPU, disk i/o, etc.
Flink's webUI will show you checkpoint size and timing information for each
sub-task; you can learn a lot from studying that data.

Relating to session windows -- could you occasionally have an unusually
long session, and might that cause problems?

Best,
David

On Tue, Jul 14, 2020 at 10:12 PM Nelson Steven <nelsonste...@johndeere.com>
wrote:

> Hello!
>
>
>
> We are experiencing occasional backpressure on a Window function in our
> pipeline. The window is on a KeyedStream and is triggered by an
> EventTimeSessionWindows.withGap(Time.seconds(30)). The prior step does a
> fanout and we use the window to sort things into batches based on the Key
> for the keyed stream. We aren’t seeing an unreasonable amount of records
> (500-600/s) on a parallism of 32 (prior step has a parallelism of 4). We
> are as interested in learning out to debug the issue as we are in fixing
> the actual problem. Any ideas?
>
>
>
> -Steve
>

Reply via email to