Hi Sanket,

As I mentioned in the previous email, it's most likely still an issue of
backpressure and you can check it as I described in that message. Either
your records are stuck in the network buffers between (I) to async
operations (if there is a network exchange), and/or inside the
`AsyncWaitOperator`'s internal queue (II). If it's causing you problems

I. For the former problem (network buffers) you can:
a) get rid of the network exchange, via removing keyBy/shuffle/rebalance
(might not be feasible, depending on your business logic)
b) reduce the amount of the in-flight data. In Flink 1.14 we are adding
automatic buffer debloating mechanism, in Flink 1.8 you can not use, but
you could manually tweak both amount and the size of the buffers. You can
read about it here [1], just ignore the automatic buffer debloating
mechanism.
II. You can change the size of the internal queue by adjusting the
`capacity` parameter [2]

The more buffered in-flight data you have between operators, the longer the
delay between processing the same record by two different operators.

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/network_mem_tuning/
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api



śr., 29 wrz 2021 o 08:20 Sanket Agrawal <sanket.agra...@infosys.com>
napisał(a):

> Hi Ragini,
>
>
>
> For measuring time in an async we have put a logger as the first and the
> last statement in asyncInvoke and for measuring time between the asyncs
> we are simply subtracting the message2’s start time and message1’s end
> time. Also, we are using 1 as the parallelism.
>
>
>
> Please let me know if you need any other information or if you have any
> recommendations on improving the approach.
>
>
>
> Thanks,
>
> Sanket Agrawal
>
>
>
> *From:* Ragini Manjaiah <ragini.manja...@gmail.com>
> *Sent:* Wednesday, September 29, 2021 11:17 AM
> *To:* Sanket Agrawal <sanket.agra...@infosys.com>
> *Cc:* Piotr Nowojski <pnowoj...@apache.org>; user@flink.apache.org
> *Subject:* Re: Event is taking a lot of time between the operators
>
>
>
> [**EXTERNAL EMAIL**]
>
> Hi Sanket,
>
>  I have a similar use case. how are you measuring the time for Async1`
> function to return the result and external api call
>
>
>
> On Wed, Sep 29, 2021 at 10:47 AM Sanket Agrawal <
> sanket.agra...@infosys.com> wrote:
>
> Hi @Piotr Nowojski <pnowoj...@apache.org>,
>
>
>
> Thank you for replying back. Yes, first async is taking between 1300-1500
> milliseconds but that is called on a CompletableFuture.*supplyAsync *and
> the Async Capacity is set to 1000.
>
>
>
> *Async Code Structure*: Inside asyncInvoke we are calling
> CompletableFuture.*supplyAsync *and inside* supplyAsync *we are calling
> an external API which is taking around 1005ms to 1040ms. Rest of the code
> for request creation/response validation is also inside the* supplyAsync *and
> is taking around 250ms.
>
>
>
> This way we tried that the main Async thread(as the async does not uses
> multiple threads directly) is available for the next message as soon as it
> calls CompletableFuture.supplyAsync on the current message.
>
>
>
> Thanks,
>
> Sanket Agrawal
>
>
>
> *From:* Piotr Nowojski <pnowoj...@apache.org>
> *Sent:* Tuesday, September 28, 2021 8:02 PM
> *To:* Sanket Agrawal <sanket.agra...@infosys.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: Event is taking a lot of time between the operators
>
>
>
> [**EXTERNAL EMAIL**]
>
> Hi,
>
>
>
> With Flink 1.8.0 I'm not sure how reliable the backpressure status is in
> the WebUI when it comes to the Async operators. If I remember correctly
> until around Flink 1.10 (+/- 2 version) backpressure monitoring was
> checking for thread dumps stuck in requesting Flink's network memory
> buffers. If in your job AsyncFunction is the source of a backpressure, it
> would be skipped and not reported. For analysing backpressure I would
> highly recommend upgrading to Flink 1.13.x as it has greatly improved
> tooling for that [1]. And in that version AsynFunctions are definitely
> handled correctly. Since Flink 1.10 I believe you can use the
> `isBackPressured` metric. In previous versions you would have to rely on
> buffer usage metrics as described here [2].
>
>
>
>
>
> [1] https://flink.apache.org/2021/07/07/backpressure.html
> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2021%2F07%2F07%2Fbackpressure.html&data=04%7C01%7Csanket.agrawal%40infosys.com%7C3c523d23c444478d85b908d9830caf3e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637684912775947321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BzEolZuvsAgudziPqGzqFuDGRZbHR2hu9D%2F9rERLwk8%3D&reserved=0>
>
> [2]
> https://flink.apache.org/2019/07/23/flink-network-stack-2.html#network-metrics
> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2019%2F07%2F23%2Fflink-network-stack-2.html%23network-metrics&data=04%7C01%7Csanket.agrawal%40infosys.com%7C3c523d23c444478d85b908d9830caf3e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637684912775957276%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Xi2g%2BQ6V0AtftHMxYrHgELNn7c%2BKL0iXdacAamI%2F3A0%3D&reserved=0>
>
>
>
> Apart of the back pressure, part of the problem might be simply how long
> does it take for `Async1` function to return the result. Have you checked
> that? Isn't it taking a couple of seconds?
>
>
>
> Best,
>
> Piotrek
>
>
>
> wt., 28 wrz 2021 o 15:55 Sanket Agrawal <sanket.agra...@infosys.com>
> napisał(a):
>
> Hi All,
>
>
>
> I am new to Flink. While developing a Flink application We observed that
> our message is taking around 10 seconds between the two Async operators.
> Below are the details.
>
>
>
>    - *Flink Flow*: Kinesis Source -> Process -> Async1 -> Async2 ->
>    Process -> Kinesis Sink
>    - *Environment*: Amazon KDA. 1 Kinesis Processing Unit (1vCore & 4GB
>    ram), and 1 parallelism.
>    - *Flink Version*: 1.8.0
>    - *Backpressure*: Flink dashboard shows that backpressure is *OK.*
>    - *Input rate: *60 messages per second.
>
>
>
> Any kind of pointers/help will be very useful.
>
>
>
> Thanks,
>
> Sanket Agrawal
>
>
>
>

Reply via email to