Hi,

Thanks for bringing this up! Generally speaking +1 for the proposal. I have
only one suggestion for the draft.

In the past years, when I was creating performance regression tickets, I
was setting the priority to a blocker and I would propose to add this to
the instructions and general convention. My reasoning behind this is that
it's an equivalent of constantly failing test on the master branch. If this
is not fixed ASAP, the problem is that it might hide another performance
regression that happened in the meantime. In the past we had a couple of
those exact problems. After fixing the original regression, after for
example a month, it was clearly visible that there was some other
regression that happened during the time the benchmark results were
incorrect. That second one was now much more difficult to find, because we
couldn't isolate a small set of potential changes that caused it (the
suspected range was not ~6 hours, but ~1 month).

Best,
Piotrek


czw., 19 sty 2023 o 10:11 Yuan Mei <yuanmei.w...@gmail.com> napisał(a):

> Hey Yanfei,
>
> Thanks so much for the efforts driving the whole process. It's great to see
> that the performance benchmarks are indeed useful to help find regressions.
> This is a discussion thread separated from the original performance
> benchmark announcement thread [1]. Let's continue here so that more people
> are aware of this change.
>
> Overall, the instructions and proposal are good. Currently, the watching is
> volunteered by Yanfei. However, she could only manage to check once or
> twice every week, so I think it is important to integrate the
> performance-watching process with the release-management process.
>
> From what I can see, there are still a couple of things that need to be
> addressed:
> - Improve the benchmark's stability [2], Yanfei is working on that.
> - Someone other than Yanfei and me (maybe release managers) trying out the
> instructions to see whether that's indeed clear how to find suspicious
> commits causing regressions.
>
> Looking forward to a more detailed discussion
>
> Best
> Yuan
>
>
>
> [1] https://www.mail-archive.com/dev@flink.apache.org/msg61178.html
> [2] https://issues.apache.org/jira/browse/FLINK-29825
>
> On Thu, Jan 19, 2023 at 4:02 PM Yanfei Lei <fredia...@gmail.com> wrote:
>
> > Hi devs,
> >
> > I'd like to start a discussion about incorporating performance
> > regression monitoring into the routine process. Flink benchmarks are
> > periodically executed on http://codespeed.dak8s.net:8080 to monitor
> > Flink performance. In late Oct'22, a new slack channel
> > #flink-dev-benchmarks was created for notifications of performance
> > regressions. It helped us find 2 build failures[1,2] and 5 performance
> > regressions[3,4,5,6,7] in the past 3 months, which is very meaningful
> > to ensuring the quality of the code.
> >
> > There are some release managers( cc @Matthias, @Martijn, @Qingsheng)
> > proposing to incorporate performance regression monitoring into the
> > release management, I think it makes sense for performance stabilities
> > (like CI stabilities), since almost every release has some tickets
> > about performance optimizations, the performance monitoring can
> > effectively avoid performance regression and track the performance
> > improvement of each release. So I start this discussion to pick
> > everyone’s brain for some suggestions.
> >
> > In the past, I checked the slack notifications once a week, and I have
> > summarized a draft[8](
> >
> https://docs.google.com/document/d/1jTTJHoCTf8_LAjviyAY3Fi7p-tYtl_zw7rJKV4V6T_c/edit?usp=sharing
> > )
> > on how to deal with performance regressions according to some
> > contributors and my own experience. If the above proposal is
> > considered acceptable, I’d like to put it in the community wiki[9].
> >
> > Looking forward to your feedback!
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-29883
> > [2] https://issues.apache.org/jira/browse/FLINK-30015
> > [3] https://issues.apache.org/jira/browse/FLINK-29886
> > [4] https://issues.apache.org/jira/browse/FLINK-30181
> > [5] https://issues.apache.org/jira/browse/FLINK-30623
> > [6] https://issues.apache.org/jira/browse/FLINK-30624
> > [7] https://issues.apache.org/jira/browse/FLINK-30625
> > [8]
> >
> https://docs.google.com/document/d/1jTTJHoCTf8_LAjviyAY3Fi7p-tYtl_zw7rJKV4V6T_c/edit?usp=sharing
> > [9]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115511847
> >
> > Best,
> > Yanfei
> >
>

Reply via email to