Hi,

Yes. Back-pressure from AsyncOperator should be correctly reported via
isBackPressured, backPressuredMsPerSecond metrics and by extension in the
WebUI from 1.13.

Piotre

pon., 12 kwi 2021 o 23:17 Lu Niu <qqib...@gmail.com> napisał(a):

> Hi, Piotr
>
> Thanks for your detailed reply! It is mentioned here we cannot observe
> backpressure generated from  AsyncOperator in Flink UI in 1.9.1. Is it
> fixed in the latest version? Thank you!
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Async-Function-Not-Generating-Backpressure-td26766.html
>
> Best
> Lu
>
> On Tue, Apr 6, 2021 at 11:14 PM Piotr Nowojski <pnowoj...@apache.org>
> wrote:
>
> > Hi,
> >
> > Yes, you can use `isBackPressured` to monitor a task's back-pressure.
> > However keep in mind:
> > a) You are going to miss some nice way to visualize this information,
> which
> > is present in 1.13's WebUI.
> > b) `isBackPressured` is a sampling based metric. If your job has varying
> > load, for example all windows firing at the same processing time, every
> > couple of seconds, causing intermittent back-pressure, this metric will
> > show it randomly as `true` or `false`.
> > c) `isBackPressured` is slightly less accurate compared to
> > `backPressuredTimeMsPerSecond`. There are some corner cases when for a
> > brief amount of time it can return `true`, while a task is still running,
> > while the time based metrics work in a different much more accurate way.
> >
> > About back porting the patches, if you want to create a custom Flink
> build
> > it should be do-able. There will be some conflicts for sure, so you will
> > need to understand Flink's code.
> >
> > Best,
> > Piotrek
> >
> > śr., 7 kwi 2021 o 02:32 Lu Niu <qqib...@gmail.com> napisał(a):
> >
> > > Hi, Piotr
> > >
> > > Thanks for replying!
> > >
> > > We don't have a plan to upgrade to 1.13 in short term. We are using
> flink
> > > 1.11 and I notice there is a metric called isBackpressured. Is that
> > enough
> > > to solve 1? If not, would backporting patches regarding
> > > backPressuredTimeMsPerSecond, busyTimeMsPerSecond and
> idleTimeMsPerSecond
> > > work? And do you have an estimate of how difficult it is?
> > >
> > >
> > > Best
> > > Lu
> > >
> > >
> > >
> > > On Tue, Apr 6, 2021 at 12:18 AM Piotr Nowojski <pnowoj...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Lately we overhauled the backpressure detection [1] and a screenshot
> > > > preview of those efforts is attached here [2]. I encourage you to
> check
> > > the
> > > > 1.13 RC0 build and how the current mechanism works for you [3]. To
> > > support
> > > > those WebUI changes we have added a couple of new metrics:
> > > > backPressuredTimeMsPerSecond, busyTimeMsPerSecond and
> > > idleTimeMsPerSecond.
> > > >
> > > > 1. I believe that solves 1.
> > > > 2. This still requires a bit of manual investigation. Once you locate
> > > > backpressuring task, you can check the detail subtask stats to check
> if
> > > all
> > > > parallel instances are uniformly backpressured/busy or not. If you
> > would
> > > > like to add a hint "it looks like you have a data skew in Task XYZ ",
> > > that
> > > > I believe could be added to the WebUI.
> > > > 3. The tricky part is how to display this kind of information.
> > Currently
> > > I
> > > > would recommend just export/report
> > > > backPressuredTimeMsPerSecond, busyTimeMsPerSecond and
> > idleTimeMsPerSecond
> > > > metrics for every task to an external system and  display them for
> > > example
> > > > in Graphana.
> > > >
> > > > The blog post you are referencing is quite outdated, especially with
> > > those
> > > > new changes from 1.13. I'm hoping to write a new one pretty soon.
> > > >
> > > > Piotrek
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-14712
> > > > [2]
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-14814?focusedCommentId=17256926&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17256926
> > > > [3]
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-user/202104.mbox/%3c1d2412ce-d4d0-ed50-6181-1b610e16d...@apache.org%3E
> > > >
> > > > pon., 5 kwi 2021 o 23:20 Lu Niu <qqib...@gmail.com> napisał(a):
> > > >
> > > > > Hi, Flink dev
> > > > >
> > > > > Lately, we want to develop some tools to:
> > > > > 1. show backpressure operator without manual operation
> > > > > 2. Provide suggestions to mitigate back pressure after checking
> data
> > > > skew,
> > > > > external service RPC etc.
> > > > > 3. Show back pressure history
> > > > >
> > > > > Could anyone share their experience with such tooling?
> > > > > Also, I notice backpressure monitoring and detection is mentioned
> > > across
> > > > > multiple places. Could someone help to explain how these connect to
> > > each
> > > > > other? Maybe some of them are outdated? Thanks!
> > > > >
> > > > > 1. The official doc introduces monitoring back pressure through web
> > UI.
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/monitoring/back_pressure.html
> > > > > 2. In
> https://flink.apache.org/2019/07/23/flink-network-stack-2.html
> > ,
> > > it
> > > > > says outPoolUsage, inPoolUsage metrics can be used to determine
> back
> > > > > pressure.
> > > > > 3. Latest flink version introduces metrics called “isBackPressured"
> > > But I
> > > > > didn't find related documentation on usage.
> > > > >
> > > > > Best
> > > > > Lu
> > > > >
> > > >
> > >
> >
>

Reply via email to