Re: How does Flink handle backpressure in EMR

Nguyen, Michael Thu, 05 Dec 2019 10:41:50 -0800

Hi Piotrek,

For the second article, I understand I can monitor the backpressure status via 
the Flink Web UI. Can I refer to the same metrics in my Flink jobs itself? For 
example, can I put in an if statement to check for when outPoolUsage reaches 
100%?


Thank you,
Michael

From: Piotr Nowojski <pi...@ververica.com>
Date: Thursday, December 5, 2019 at 10:27 AM
To: Michael Nguyen <michael.nguye...@t-mobile.com>
Cc: Khachatryan Roman <khachatryan.ro...@gmail.com>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: How does Flink handle backpressure in EMR

[External]

Hi,

If you are using event time and watermarks, you can monitor the delays using 
`currentInputWatermark` metric [1]. If not (or alternatively), this blog post 
[2] describes how to check back pressure status [2] for Flink up to 1.9. In 
Flink 1.10 there will be an additional new metric for that [3].

Piotrek

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/debugging_event_time.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.9%2Fmonitoring%2Fdebugging_event_time.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570374404&sdata=GS8PCzeohW95e%2BT5phGljgHdMArImMjqBxSkR79dIzw%3D&reserved=0>
[2] 
https://flink.apache.org/2019/07/23/flink-network-stack-2.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2019%2F07%2F23%2Fflink-network-stack-2.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=fA%2FBiBnL%2BdqLN4FJ9t2%2B71b7M7Ii7rjfrsmRUlqgIiA%3D&reserved=0>
[3] 
https://issues.apache.org/jira/browse/FLINK-14813<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-14813&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=NSe%2BN9ur9u3YLEGqqq%2F%2FBH8XXgd2jtlwV67LUyXmA8A%3D&reserved=0>


On 5 Dec 2019, at 19:11, Nguyen, Michael 
<michael.nguye...@t-mobile.com<mailto:michael.nguye...@t-mobile.com>> wrote:

Hi Roman,

So right now we have a couple Flink jobs that consumes data from one Kinesis 
data stream. These jobs vary from a simple dump into a PostgreSQL table to 
calculating anomalies in a 30 minute window.

One large scenario we were worried about was what if one of our jobs was taking 
a long time to process the Kinesis stream data? How would we detect this 
scenario from within our Flink job?

We do not want our Flink jobs to lag too far from the latest point in our 
Kinesis stream as we are trying to deliver information in (near) real-time.

From: Khachatryan Roman 
<khachatryan.ro...@gmail.com<mailto:khachatryan.ro...@gmail.com>>
Date: Thursday, December 5, 2019 at 9:47 AM
To: Michael Nguyen 
<michael.nguye...@t-mobile.com<mailto:michael.nguye...@t-mobile.com>>
Cc: Piotr Nowojski <pi...@ververica.com<mailto:pi...@ververica.com>>, 
"user@flink.apache.org<mailto:user@flink.apache.org>" 
<user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: Re: How does Flink handle backpressure in EMR

[External]

@Michael,
Could you please describe your topology with which operators being slow, 
back-pressured and probably skews in sources?

Regards,
Roman


On Thu, Dec 5, 2019 at 6:20 PM Nguyen, Michael 
<michael.nguye...@t-mobile.com<mailto:michael.nguye...@t-mobile.com>> wrote:
Thank you for the response Roman and Piotrek!

@Roman - can you clarify on what you mean when you mentioned Flink propagating 
it back to the sources?

Also, if one of my Flink operators is processing records too slowly and is 
getting further away from the latest record of my source data stream, is there 
a way to detect this slow processing in Flink? Would this be detected by 
Flink's backpressure mechanism?

Thanks,
Michael

On 12/5/19, 7:57 AM, "Piotr Nowojski" 
<pi...@data-artisans.com<mailto:pi...@data-artisans.com> on behalf of 
pi...@ververica.com<mailto:pi...@ververica.com>> wrote:

    [External]


    Hi Michael,

    As Roman pointed out Flink currently doesn’t support the auto-scaling. It’s 
on our roadmap but it requires quite a bit of preliminary work to happen before.

    Piotrek

    > On 5 Dec 2019, at 15:32, r_khachatryan 
<khachatryan.ro...@gmail.com<mailto:khachatryan.ro...@gmail.com>> wrote:
    >
    > Hi Michael
    >
    > Flink *does* detect backpressure but currently, it only propagates it back
    > to sources.
    > And so it doesn't support auto-scaling.
    >
    > Regards,
    > Roman
    >
    >
    > Nguyen, Michael wrote
    >> How does Flink handle backpressure (caused by an increase in traffic) in 
a
    >> Flink job when it’s being hosted in an EMR cluster? Does Flink detect the
    >> backpressure and auto-scales the EMR cluster to handle the workload to
    >> relieve the backpressure? Once the backpressure is gone, then the EMR
    >> cluster would scale back down?
    >
    >
    >
    >
    >
    > --
    > Sent from: 
https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&amp;data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cfda964dacbf04cc2d2cb08d7799bc347%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111582216065152&amp;sdata=FIHbAWbkH7Tq8AjG%2BVSZoxNrOl0Bn6SukN%2BY1PyhSHg%3D&amp;reserved=0<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570394400&sdata=j0YcpGEm3Lour%2FbVi2WX7hSH1vtxcBRUXNgeNnrCgBU%3D&reserved=0>

Re: How does Flink handle backpressure in EMR

Reply via email to