Thank you @Piotr Nowojski<mailto:pnowoj...@apache.org> for helping me.

From: Piotr Nowojski <pnowoj...@apache.org>
Sent: Wednesday, September 29, 2021 12:53 PM
To: Sanket Agrawal <sanket.agra...@infosys.com>
Cc: Ragini Manjaiah <ragini.manja...@gmail.com>; user@flink.apache.org
Subject: Re: Event is taking a lot of time between the operators


[**EXTERNAL EMAIL**]
Hi Sanket,

As I mentioned in the previous email, it's most likely still an issue of 
backpressure and you can check it as I described in that message. Either your 
records are stuck in the network buffers between (I) to async operations (if 
there is a network exchange), and/or inside the `AsyncWaitOperator`'s internal 
queue (II). If it's causing you problems

I. For the former problem (network buffers) you can:
a) get rid of the network exchange, via removing keyBy/shuffle/rebalance (might 
not be feasible, depending on your business logic)
b) reduce the amount of the in-flight data. In Flink 1.14 we are adding 
automatic buffer debloating mechanism, in Flink 1.8 you can not use, but you 
could manually tweak both amount and the size of the buffers. You can read 
about it here [1], just ignore the automatic buffer debloating mechanism.
II. You can change the size of the internal queue by adjusting the `capacity` 
parameter [2]

The more buffered in-flight data you have between operators, the longer the 
delay between processing the same record by two different operators.

Best,
Piotrek

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/network_mem_tuning/<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnightlies.apache.org%2Fflink%2Fflink-docs-master%2Fdocs%2Fdeployment%2Fmemory%2Fnetwork_mem_tuning%2F&data=04%7C01%7Csanket.agrawal%40infosys.com%7C7c0683dcdf29475809d908d9831a36f1%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637684970891526908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UCwjYQ7ivszHcaINrZ96sop%2BTvTLwpo9epkc6v77drg%3D&reserved=0>
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnightlies.apache.org%2Fflink%2Fflink-docs-master%2Fdocs%2Fdev%2Fdatastream%2Foperators%2Fasyncio%2F%23async-io-api&data=04%7C01%7Csanket.agrawal%40infosys.com%7C7c0683dcdf29475809d908d9831a36f1%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637684970891536859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LZCgZc3bs7u%2ByPtnm8XY%2FTK3S1Y7GPh9kN5Pr7TyOjY%3D&reserved=0>



śr., 29 wrz 2021 o 08:20 Sanket Agrawal 
<sanket.agra...@infosys.com<mailto:sanket.agra...@infosys.com>> napisał(a):
Hi Ragini,

For measuring time in an async we have put a logger as the first and the last 
statement in asyncInvoke and for measuring time between the asyncs we are 
simply subtracting the message2's start time and message1's end time. Also, we 
are using 1 as the parallelism.

Please let me know if you need any other information or if you have any 
recommendations on improving the approach.

Thanks,
Sanket Agrawal

From: Ragini Manjaiah 
<ragini.manja...@gmail.com<mailto:ragini.manja...@gmail.com>>
Sent: Wednesday, September 29, 2021 11:17 AM
To: Sanket Agrawal 
<sanket.agra...@infosys.com<mailto:sanket.agra...@infosys.com>>
Cc: Piotr Nowojski <pnowoj...@apache.org<mailto:pnowoj...@apache.org>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Event is taking a lot of time between the operators


[**EXTERNAL EMAIL**]
Hi Sanket,
 I have a similar use case. how are you measuring the time for Async1` function 
to return the result and external api call

On Wed, Sep 29, 2021 at 10:47 AM Sanket Agrawal 
<sanket.agra...@infosys.com<mailto:sanket.agra...@infosys.com>> wrote:
Hi @Piotr Nowojski<mailto:pnowoj...@apache.org>,

Thank you for replying back. Yes, first async is taking between 1300-1500 
milliseconds but that is called on a CompletableFuture.supplyAsync and the 
Async Capacity is set to 1000.

Async Code Structure: Inside asyncInvoke we are calling 
CompletableFuture.supplyAsync and inside supplyAsync we are calling an external 
API which is taking around 1005ms to 1040ms. Rest of the code for request 
creation/response validation is also inside the supplyAsync and is taking 
around 250ms.

This way we tried that the main Async thread(as the async does not uses 
multiple threads directly) is available for the next message as soon as it 
calls CompletableFuture.supplyAsync on the current message.

Thanks,
Sanket Agrawal

From: Piotr Nowojski <pnowoj...@apache.org<mailto:pnowoj...@apache.org>>
Sent: Tuesday, September 28, 2021 8:02 PM
To: Sanket Agrawal 
<sanket.agra...@infosys.com<mailto:sanket.agra...@infosys.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Event is taking a lot of time between the operators


[**EXTERNAL EMAIL**]
Hi,

With Flink 1.8.0 I'm not sure how reliable the backpressure status is in the 
WebUI when it comes to the Async operators. If I remember correctly until 
around Flink 1.10 (+/- 2 version) backpressure monitoring was checking for 
thread dumps stuck in requesting Flink's network memory buffers. If in your job 
AsyncFunction is the source of a backpressure, it would be skipped and not 
reported. For analysing backpressure I would highly recommend upgrading to 
Flink 1.13.x as it has greatly improved tooling for that [1]. And in that 
version AsynFunctions are definitely handled correctly. Since Flink 1.10 I 
believe you can use the `isBackPressured` metric. In previous versions you 
would have to rely on buffer usage metrics as described here [2].


[1] 
https://flink.apache.org/2021/07/07/backpressure.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2021%2F07%2F07%2Fbackpressure.html&data=04%7C01%7Csanket.agrawal%40infosys.com%7C7c0683dcdf29475809d908d9831a36f1%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637684970891546816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mLbLr99gt4v5unD6BxoXG%2F%2B4N98uVo%2FVwA49e%2F94qU4%3D&reserved=0>
[2] 
https://flink.apache.org/2019/07/23/flink-network-stack-2.html#network-metrics<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2019%2F07%2F23%2Fflink-network-stack-2.html%23network-metrics&data=04%7C01%7Csanket.agrawal%40infosys.com%7C7c0683dcdf29475809d908d9831a36f1%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637684970891556765%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4LKUKiT06xIoAPkZEFdFZQny%2FbnzUK%2BRdi7ik5TfFGg%3D&reserved=0>

Apart of the back pressure, part of the problem might be simply how long does 
it take for `Async1` function to return the result. Have you checked that? 
Isn't it taking a couple of seconds?

Best,
Piotrek

wt., 28 wrz 2021 o 15:55 Sanket Agrawal 
<sanket.agra...@infosys.com<mailto:sanket.agra...@infosys.com>> napisał(a):
Hi All,

I am new to Flink. While developing a Flink application We observed that our 
message is taking around 10 seconds between the two Async operators. Below are 
the details.


  *   Flink Flow: Kinesis Source -> Process -> Async1 -> Async2 -> Process -> 
Kinesis Sink
  *   Environment: Amazon KDA. 1 Kinesis Processing Unit (1vCore & 4GB ram), 
and 1 parallelism.
  *   Flink Version: 1.8.0
  *   Backpressure: Flink dashboard shows that backpressure is OK.
  *   Input rate: 60 messages per second.

Any kind of pointers/help will be very useful.

Thanks,
Sanket Agrawal

Reply via email to