RE: In what cases Flink op is backpressured but the downstream op is not busy or backpressured?

Schwalbe Matthias Tue, 06 Aug 2024 23:41:14 -0700

Hi Brian,

Not a direct answer to your question;


  *   Increasing/configuring buffers might not help (Flink 
self-organizes/optimizes buffers), but maybe increasing parallelism
  *   Something in the numbers you related does not quite fit. With parallelism 
1, not having configured slot groups you should have 1 slot in use and 19 of 
them available !?!?
  *   In order to find out where the time/cpu capacity is spent you can
     *   Enable flame graph generation in config, and then
     *   On the job dashboard look into the flame graphs of each operator to 
see what’s going on (click operator, select flame-graph tab on the right)
     *   My gut feeling, that it might have to do with serialization, given 
you’re a stream events are big

Cheers

Thias



From: Fanchun Jin <foun...@gmail.com>
Sent: Tuesday, August 6, 2024 10:54 PM
To: user@flink.apache.org
Subject: In what cases Flink op is backpressured but the downstream op is not 
busy or backpressured?

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi,


We are using Flink 1.15.4 streaming running on AWS EKS, and we use default flow 
control config (e.g. floating-buffers-per-gate, memory.buffers-per-channel, 
max-buffers-per-channel).

The entire pipeline is running on a single task manager (10 core, 24G RAM, 20 
total task slots, 8 available task slots). Pod CPU utilization over pod limit 
is 20, and Pod memory utilization over pod limit is 60.

A portion of pipeline topology is this<https://i.sstatic.net/cJ3XCGgY.png>, 
where OpA and OpB are publishing messages to OpC, and all 3 Ops are with 
parallelism 1.

Op A generates 30 records per minutes, per record size is 8 MB. Op B generates 
180K records per minutes, per record size is 1.8KB.

OpC doesn't have external dependencies, and simply forward the messages to 
downstream.

The issue is that while both OpC and OpA are not backpressured and not busy, 
but OpB is 100% backpressured.

I tried with a large flow control config floating-buffers-per-gate: 200 
buffers-per-channel: 100 max-buffers-per-channel: 100, but it didn't help.

I also tried increasing the parallelism of OpC to 8, and it didn't help too.

If I remove OpA input to OpC, OpB is flowing much faster, and OpC becomes busy 
with processing messages from OpB.

What can I do to connect back OpA -> OpC, and OpB is less backpressured?

Any help is greatly appreciated!

Thanks,

Brian
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.

RE: In what cases Flink op is backpressured but the downstream op is not busy or backpressured?

Reply via email to