Hi devs,

I'd like to start a discussion about incorporating performance
regression monitoring into the routine process. Flink benchmarks are
periodically executed on http://codespeed.dak8s.net:8080 to monitor
Flink performance. In late Oct'22, a new slack channel
#flink-dev-benchmarks was created for notifications of performance
regressions. It helped us find 2 build failures[1,2] and 5 performance
regressions[3,4,5,6,7] in the past 3 months, which is very meaningful
to ensuring the quality of the code.

There are some release managers( cc @Matthias, @Martijn, @Qingsheng)
proposing to incorporate performance regression monitoring into the
release management, I think it makes sense for performance stabilities
(like CI stabilities), since almost every release has some tickets
about performance optimizations, the performance monitoring can
effectively avoid performance regression and track the performance
improvement of each release. So I start this discussion to pick
everyone’s brain for some suggestions.

In the past, I checked the slack notifications once a week, and I have
summarized a 
draft[8](https://docs.google.com/document/d/1jTTJHoCTf8_LAjviyAY3Fi7p-tYtl_zw7rJKV4V6T_c/edit?usp=sharing)
on how to deal with performance regressions according to some
contributors and my own experience. If the above proposal is
considered acceptable, I’d like to put it in the community wiki[9].

Looking forward to your feedback!

[1] https://issues.apache.org/jira/browse/FLINK-29883
[2] https://issues.apache.org/jira/browse/FLINK-30015
[3] https://issues.apache.org/jira/browse/FLINK-29886
[4] https://issues.apache.org/jira/browse/FLINK-30181
[5] https://issues.apache.org/jira/browse/FLINK-30623
[6] https://issues.apache.org/jira/browse/FLINK-30624
[7] https://issues.apache.org/jira/browse/FLINK-30625
[8] 
https://docs.google.com/document/d/1jTTJHoCTf8_LAjviyAY3Fi7p-tYtl_zw7rJKV4V6T_c/edit?usp=sharing
[9] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115511847

Best,
Yanfei

Reply via email to