Hi Piotrek, For the second article, I understand I can monitor the backpressure status via the Flink Web UI. Can I refer to the same metrics in my Flink jobs itself? For example, can I put in an if statement to check for when outPoolUsage reaches 100%?
Thank you, Michael From: Piotr Nowojski <pi...@ververica.com> Date: Thursday, December 5, 2019 at 10:27 AM To: Michael Nguyen <michael.nguye...@t-mobile.com> Cc: Khachatryan Roman <khachatryan.ro...@gmail.com>, "user@flink.apache.org" <user@flink.apache.org> Subject: Re: How does Flink handle backpressure in EMR [External] Hi, If you are using event time and watermarks, you can monitor the delays using `currentInputWatermark` metric [1]. If not (or alternatively), this blog post [2] describes how to check back pressure status [2] for Flink up to 1.9. In Flink 1.10 there will be an additional new metric for that [3]. Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/debugging_event_time.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.9%2Fmonitoring%2Fdebugging_event_time.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570374404&sdata=GS8PCzeohW95e%2BT5phGljgHdMArImMjqBxSkR79dIzw%3D&reserved=0> [2] https://flink.apache.org/2019/07/23/flink-network-stack-2.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2019%2F07%2F23%2Fflink-network-stack-2.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=fA%2FBiBnL%2BdqLN4FJ9t2%2B71b7M7Ii7rjfrsmRUlqgIiA%3D&reserved=0> [3] https://issues.apache.org/jira/browse/FLINK-14813<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-14813&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=NSe%2BN9ur9u3YLEGqqq%2F%2FBH8XXgd2jtlwV67LUyXmA8A%3D&reserved=0> On 5 Dec 2019, at 19:11, Nguyen, Michael <michael.nguye...@t-mobile.com<mailto:michael.nguye...@t-mobile.com>> wrote: Hi Roman, So right now we have a couple Flink jobs that consumes data from one Kinesis data stream. These jobs vary from a simple dump into a PostgreSQL table to calculating anomalies in a 30 minute window. One large scenario we were worried about was what if one of our jobs was taking a long time to process the Kinesis stream data? How would we detect this scenario from within our Flink job? We do not want our Flink jobs to lag too far from the latest point in our Kinesis stream as we are trying to deliver information in (near) real-time. From: Khachatryan Roman <khachatryan.ro...@gmail.com<mailto:khachatryan.ro...@gmail.com>> Date: Thursday, December 5, 2019 at 9:47 AM To: Michael Nguyen <michael.nguye...@t-mobile.com<mailto:michael.nguye...@t-mobile.com>> Cc: Piotr Nowojski <pi...@ververica.com<mailto:pi...@ververica.com>>, "user@flink.apache.org<mailto:user@flink.apache.org>" <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: Re: How does Flink handle backpressure in EMR [External] @Michael, Could you please describe your topology with which operators being slow, back-pressured and probably skews in sources? Regards, Roman On Thu, Dec 5, 2019 at 6:20 PM Nguyen, Michael <michael.nguye...@t-mobile.com<mailto:michael.nguye...@t-mobile.com>> wrote: Thank you for the response Roman and Piotrek! @Roman - can you clarify on what you mean when you mentioned Flink propagating it back to the sources? Also, if one of my Flink operators is processing records too slowly and is getting further away from the latest record of my source data stream, is there a way to detect this slow processing in Flink? Would this be detected by Flink's backpressure mechanism? Thanks, Michael On 12/5/19, 7:57 AM, "Piotr Nowojski" <pi...@data-artisans.com<mailto:pi...@data-artisans.com> on behalf of pi...@ververica.com<mailto:pi...@ververica.com>> wrote: [External] Hi Michael, As Roman pointed out Flink currently doesn’t support the auto-scaling. It’s on our roadmap but it requires quite a bit of preliminary work to happen before. Piotrek > On 5 Dec 2019, at 15:32, r_khachatryan <khachatryan.ro...@gmail.com<mailto:khachatryan.ro...@gmail.com>> wrote: > > Hi Michael > > Flink *does* detect backpressure but currently, it only propagates it back > to sources. > And so it doesn't support auto-scaling. > > Regards, > Roman > > > Nguyen, Michael wrote >> How does Flink handle backpressure (caused by an increase in traffic) in a >> Flink job when it’s being hosted in an EMR cluster? Does Flink detect the >> backpressure and auto-scales the EMR cluster to handle the workload to >> relieve the backpressure? Once the backpressure is gone, then the EMR >> cluster would scale back down? > > > > > > -- > Sent from: https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cfda964dacbf04cc2d2cb08d7799bc347%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111582216065152&sdata=FIHbAWbkH7Tq8AjG%2BVSZoxNrOl0Bn6SukN%2BY1PyhSHg%3D&reserved=0<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570394400&sdata=j0YcpGEm3Lour%2FbVi2WX7hSH1vtxcBRUXNgeNnrCgBU%3D&reserved=0>