Hi, Yes. Back-pressure from AsyncOperator should be correctly reported via isBackPressured, backPressuredMsPerSecond metrics and by extension in the WebUI from 1.13.
Piotre pon., 12 kwi 2021 o 23:17 Lu Niu <qqib...@gmail.com> napisał(a): > Hi, Piotr > > Thanks for your detailed reply! It is mentioned here we cannot observe > backpressure generated from AsyncOperator in Flink UI in 1.9.1. Is it > fixed in the latest version? Thank you! > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Async-Function-Not-Generating-Backpressure-td26766.html > > Best > Lu > > On Tue, Apr 6, 2021 at 11:14 PM Piotr Nowojski <pnowoj...@apache.org> > wrote: > > > Hi, > > > > Yes, you can use `isBackPressured` to monitor a task's back-pressure. > > However keep in mind: > > a) You are going to miss some nice way to visualize this information, > which > > is present in 1.13's WebUI. > > b) `isBackPressured` is a sampling based metric. If your job has varying > > load, for example all windows firing at the same processing time, every > > couple of seconds, causing intermittent back-pressure, this metric will > > show it randomly as `true` or `false`. > > c) `isBackPressured` is slightly less accurate compared to > > `backPressuredTimeMsPerSecond`. There are some corner cases when for a > > brief amount of time it can return `true`, while a task is still running, > > while the time based metrics work in a different much more accurate way. > > > > About back porting the patches, if you want to create a custom Flink > build > > it should be do-able. There will be some conflicts for sure, so you will > > need to understand Flink's code. > > > > Best, > > Piotrek > > > > śr., 7 kwi 2021 o 02:32 Lu Niu <qqib...@gmail.com> napisał(a): > > > > > Hi, Piotr > > > > > > Thanks for replying! > > > > > > We don't have a plan to upgrade to 1.13 in short term. We are using > flink > > > 1.11 and I notice there is a metric called isBackpressured. Is that > > enough > > > to solve 1? If not, would backporting patches regarding > > > backPressuredTimeMsPerSecond, busyTimeMsPerSecond and > idleTimeMsPerSecond > > > work? And do you have an estimate of how difficult it is? > > > > > > > > > Best > > > Lu > > > > > > > > > > > > On Tue, Apr 6, 2021 at 12:18 AM Piotr Nowojski <pnowoj...@apache.org> > > > wrote: > > > > > > > Hi, > > > > > > > > Lately we overhauled the backpressure detection [1] and a screenshot > > > > preview of those efforts is attached here [2]. I encourage you to > check > > > the > > > > 1.13 RC0 build and how the current mechanism works for you [3]. To > > > support > > > > those WebUI changes we have added a couple of new metrics: > > > > backPressuredTimeMsPerSecond, busyTimeMsPerSecond and > > > idleTimeMsPerSecond. > > > > > > > > 1. I believe that solves 1. > > > > 2. This still requires a bit of manual investigation. Once you locate > > > > backpressuring task, you can check the detail subtask stats to check > if > > > all > > > > parallel instances are uniformly backpressured/busy or not. If you > > would > > > > like to add a hint "it looks like you have a data skew in Task XYZ ", > > > that > > > > I believe could be added to the WebUI. > > > > 3. The tricky part is how to display this kind of information. > > Currently > > > I > > > > would recommend just export/report > > > > backPressuredTimeMsPerSecond, busyTimeMsPerSecond and > > idleTimeMsPerSecond > > > > metrics for every task to an external system and display them for > > > example > > > > in Graphana. > > > > > > > > The blog post you are referencing is quite outdated, especially with > > > those > > > > new changes from 1.13. I'm hoping to write a new one pretty soon. > > > > > > > > Piotrek > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-14712 > > > > [2] > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-14814?focusedCommentId=17256926&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17256926 > > > > [3] > > > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/flink-user/202104.mbox/%3c1d2412ce-d4d0-ed50-6181-1b610e16d...@apache.org%3E > > > > > > > > pon., 5 kwi 2021 o 23:20 Lu Niu <qqib...@gmail.com> napisał(a): > > > > > > > > > Hi, Flink dev > > > > > > > > > > Lately, we want to develop some tools to: > > > > > 1. show backpressure operator without manual operation > > > > > 2. Provide suggestions to mitigate back pressure after checking > data > > > > skew, > > > > > external service RPC etc. > > > > > 3. Show back pressure history > > > > > > > > > > Could anyone share their experience with such tooling? > > > > > Also, I notice backpressure monitoring and detection is mentioned > > > across > > > > > multiple places. Could someone help to explain how these connect to > > > each > > > > > other? Maybe some of them are outdated? Thanks! > > > > > > > > > > 1. The official doc introduces monitoring back pressure through web > > UI. > > > > > > > > > > > > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/monitoring/back_pressure.html > > > > > 2. In > https://flink.apache.org/2019/07/23/flink-network-stack-2.html > > , > > > it > > > > > says outPoolUsage, inPoolUsage metrics can be used to determine > back > > > > > pressure. > > > > > 3. Latest flink version introduces metrics called “isBackPressured" > > > But I > > > > > didn't find related documentation on usage. > > > > > > > > > > Best > > > > > Lu > > > > > > > > > > > > > > >