Re: [DISCUSS] Incorporate performance regression monitoring into routine process

Piotr Nowojski Mon, 30 Jan 2023 06:56:18 -0800

Hi Dong,

The main issue with an automatic tool at the moment is that some benchmarks
are quite noisy and performance regressions are often within the noise of a
given benchmark. Our currently existing tooling can not handle those cases.
Until we address this issue, I think it will have to remain a manual
process. There is a ticket mentioned by Yuan [1] where I have written a
comment and a proposal on how to improve the automatic performance
regression detection.


Best,
Piotrek

[1]
https://issues.apache.org/jira/browse/FLINK-29825?focusedCommentId=17679077&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17679077


pon., 30 sty 2023 o 15:31 Dong Lin <lindon...@gmail.com> napisał(a):

> Hi Yanfei,
>
> Thanks for driving the benchmark monitoring effort! The Google doc and the
> community wiki looks pretty good.
>
> According to Yuan's comment, it seems that we currently manually watch the
> benchmark results to detect regression. Have we considered automating this
> process by e.g. exporting the nightly benchmark results to a database and
> using scripts to detect regression based on pre-defined rules?
>
> This approach is probably more scalable and accurate in the long term. And
> I had a good experience working with such a regression detection tool in my
> past job.
>
> Thanks,
> Dong
>
>
>
> On Thu, Jan 19, 2023 at 4:02 PM Yanfei Lei <fredia...@gmail.com> wrote:
>
> > Hi devs,
> >
> > I'd like to start a discussion about incorporating performance
> > regression monitoring into the routine process. Flink benchmarks are
> > periodically executed on http://codespeed.dak8s.net:8080 to monitor
> > Flink performance. In late Oct'22, a new slack channel
> > #flink-dev-benchmarks was created for notifications of performance
> > regressions. It helped us find 2 build failures[1,2] and 5 performance
> > regressions[3,4,5,6,7] in the past 3 months, which is very meaningful
> > to ensuring the quality of the code.
> >
> > There are some release managers( cc @Matthias, @Martijn, @Qingsheng)
> > proposing to incorporate performance regression monitoring into the
> > release management, I think it makes sense for performance stabilities
> > (like CI stabilities), since almost every release has some tickets
> > about performance optimizations, the performance monitoring can
> > effectively avoid performance regression and track the performance
> > improvement of each release. So I start this discussion to pick
> > everyone’s brain for some suggestions.
> >
> > In the past, I checked the slack notifications once a week, and I have
> > summarized a draft[8](
> >
> https://docs.google.com/document/d/1jTTJHoCTf8_LAjviyAY3Fi7p-tYtl_zw7rJKV4V6T_c/edit?usp=sharing
> > )
> > on how to deal with performance regressions according to some
> > contributors and my own experience. If the above proposal is
> > considered acceptable, I’d like to put it in the community wiki[9].
> >
> > Looking forward to your feedback!
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-29883
> > [2] https://issues.apache.org/jira/browse/FLINK-30015
> > [3] https://issues.apache.org/jira/browse/FLINK-29886
> > [4] https://issues.apache.org/jira/browse/FLINK-30181
> > [5] https://issues.apache.org/jira/browse/FLINK-30623
> > [6] https://issues.apache.org/jira/browse/FLINK-30624
> > [7] https://issues.apache.org/jira/browse/FLINK-30625
> > [8]
> >
> https://docs.google.com/document/d/1jTTJHoCTf8_LAjviyAY3Fi7p-tYtl_zw7rJKV4V6T_c/edit?usp=sharing
> > [9]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115511847
> >
> > Best,
> > Yanfei
> >
>

Re: [DISCUSS] Incorporate performance regression monitoring into routine process

Reply via email to