It looks like the most recent run of JDK 11 saw a big improvement[1] of the
performance of the test. That improvement seems related to [2] which is a
fix for FLINK-35215 [3]. That suggests to me that the test isn't as
isolated to the performance of the code its trying to test as would be
ideal. However I've only just started looking at the test suite and trying
to run locally so I'm not very well placed to judge.

It does however suggest that this shouldn't be a blocker for the release.



[1] http://flink-speed.xyz/changes/?rev=c1baf07d76&exe=6&env=3
[2]
https://github.com/apache/flink/commit/c1baf07d7601a683f42997dc35dfaef4e41bc928
[3] https://issues.apache.org/jira/browse/FLINK-35215

On Wed, 22 May 2024 at 00:15, Piotr Nowojski <pnowoj...@apache.org> wrote:

> Hi,
>
> Given what you wrote, that you have investigated the issue and couldn't
> find any easy explanation, I would suggest closing this ticket as "Won't
> do" or "Can not reproduce" and ignoring the problem.
>
> In the past there have been quite a bit of cases where some benchmark
> detected a performance regression. Sometimes those can not be reproduced,
> other times (as it's the case here), some seemingly unrelated change is
> causing the regression. The same thing happened in this benchmark many
> times in the past [1], [2], [3], [4]. Generally speaking this benchmark has
> been in the spotlight a couple of times [5].
>
> Note that there have been cases where this benchmark did detect a
> performance regression :)
>
> My personal suspicion is that after that commons-io version bump,
> something poked JVM/JIT to compile the code a bit differently for string
> serialization causing this regression. We have a couple of benchmarks that
> seem to be prone to such semi intermittent issues. For example the same
> benchmark was subject to this annoying pattern [6], that I've spotted in
> quite a bit of benchmarks over the years [6]:
>
> [image: image.png]
> (https://imgur.com/a/AoygmWS)
>
> Where benchmark results are very stable within a single JVM fork. But
> between two forks, they can reach two different "stable" levels. Here it
> looks like 50% of the chance of getting stable "200 records/ms" and 50%
> chances of "250 records/ms".
>
> A small interlude. Each of our benchmarks run in 3 different JVM forks, 10
> warm up iterations and 10 measurement iterations. Each iteration
> lasts/invokes the benchmarking method at least for one second. So by "very
> stable" results, I mean that for example after the 2nd or 3rd warm up
> iteration, the results stabilize < +/-1%, and stay on that level for the
> whole duration of the fork.
>
> Given that we are repeating the same benchmark in 3 different forks, we
> can have by pure chance:
> - 3 slow fork - total average 200 records/ms
> - 2 slow fork, 1 fast fork - average 216 r/ms
> - 1 slow fork, 2 fast forks - average 233 r/ms
> - 3 fast forks - average 250 r/ms
>
> So this benchmark is susceptible to enter some different semi stable
> states. As I wrote above, I guess something with the commons-io version
> bump just swayed it to a different semi stable state :( I have never gotten
> desperate enough to actually dig further what's exactly causing this kind
> of issues.
>
> Best,
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/FLINK-18684
> [2] https://issues.apache.org/jira/browse/FLINK-27133
> [3] https://issues.apache.org/jira/browse/FLINK-27165
> [4] https://issues.apache.org/jira/browse/FLINK-31745
> [5]
> https://issues.apache.org/jira/browse/FLINK-35040?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20Resolved%2C%20Closed)%20AND%20text%20~%20%22serializerHeavyString%22
> [6]
> http://flink-speed.xyz/timeline/#/?exe=1&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=1000
>
> wt., 21 maj 2024 o 12:50 Rui Fan <1996fan...@gmail.com> napisaƂ(a):
>
>> Hi devs:
>>
>> We(release managers of flink 1.20) wanna update one performance
>> regresses to the flink dev mail list.
>>
>> # Background:
>>
>> The performance of serializerHeavyString starts regress since April 3,
>> and we created FLINK-35040[1] to follow it.
>>
>> In brief:
>> - The performance only regresses for jdk 11, and Java 8 and Java 17 are
>> fine.
>> - The regression reason is upgrading commons-io version from 2.11.0 to
>> 2.15.1
>>   - This upgrading is done in FLINK-34955[2].
>>   - The performance can be recovered after reverting the commons-io
>> version
>> to 2.11.0
>>
>> You can get more details from FLINK-35040[1].
>>
>> # Problem
>>
>> We try to generate the flame graph (wall mode) to analyze why upgrading
>> the commons-io version affects the performance. These flamegraphs can
>> be found in FLINK-35040[1]. (Many thanks to Zakelly for generating these
>> flamegraphs from the benchmark server).
>>
>> Unfortunately, we cannot find any code of commons-io dependency is called.
>> Also, we try to analyze if any other dependencies are changed during
>> upgrading
>> commons-io version. The result is no, other dependencies are totally the
>> same.
>>
>> # Request
>>
>> After the above analysis, we cannot find why the performance of
>> serializerHeavyString
>> starts to regress for jdk11.
>>
>> We are looking forward to hearing valuable suggestions from the Flink
>> community.
>> Thanks everyone in advance.
>>
>> Note:
>> 1. I cannot reproduce the regression on my Mac with jdk11, and we suspect
>>   this regression may be caused by the benchmark environment.
>> 2. We will accept this regression if the issue still cannot be solved.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-35040
>> [2] https://issues.apache.org/jira/browse/FLINK-34955
>>
>> Best,
>> Weijie, Ufuk, Robert and Rui
>>
>

Reply via email to