[
https://issues.apache.org/jira/browse/GEODE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291981#comment-17291981
]
Hale Bales commented on GEODE-8950:
-----------------------------------
- the first known CI failure was on 02/04/2021
- we do not have CI history before 02/01/2021
- these failures are occuring both in CI and when run using the scripts
- the test that is failing was added in November of 2020
- running develop against 1.13.0 does not produce consistent benchmark results
- running with a baseline of 1.13.1 does not improve the failure rate
- running 1.13.0 against itself does not produce consistently passing results
- running develop against itself does not produce consistently passing results
- there have been no changes to benchmarks this year (as of feb 26, 2021)
- there do not appear to be any suspect changes to geode core made this year
- Jake Barrett, Donal Evans, and I have all looked at the commits
- no commits are in the right area of the code
- I have tested all code changes that even had the slightest chance of
changing the performance in P2pPartitionedPutLongBenchmark
- the changes to dependencies do not seem to have changed the performance
- profiling the test for the following did not produce any useful information:
- cpu usage
- allocations
- locks
- looking at the gfs logs showed that (on a failing run):
- develop did fewer puts than 1.13.0
- develop had less cpu activity
- develop received fewer bytes
- these results are expected for a run where develop had lower throughput than
1.13.0
- this benchmark has a very small payload size
- in the past the performance team saw a high degree of sensitivity in tests
with small payloads
conclusions:
- these failures do not appear to be caused by any code change
- these failures do not appear to be caused by any benchmarking change
- these failures do not appear to be caused by any dependency change
- the instability when running the same version/commit against itself points to
the issue being the overhead for each operation for such a small payload
- there is no data to support that this failure is occuring more often than
previously
proposed next stepts:
- keep running this test and keep track of the failure rate
- if the failure rate increases, investigate the peer-to-peer code
- if the failure rate stays the same, comment out the test
- long term, invest time in a significant refactor of the peer-to-peer code
> Benchmark failure - P2pPartitionedPutLongBenchmark
> --------------------------------------------------
>
> Key: GEODE-8950
> URL: https://issues.apache.org/jira/browse/GEODE-8950
> Project: Geode
> Issue Type: Bug
> Components: benchmarks
> Affects Versions: 1.15.0
> Reporter: Donal Evans
> Assignee: Hale Bales
> Priority: Major
>
> Multiple benchmark failures due to P2pPartitionedPutLongBenchmark have been
> seen recently.
> This run saw 3 out of the 5 repeats fail due to flagged degradations in
> P2pPartitionedPutLongBenchmark:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/16|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/16#L601ed52d:5552]
> This run saw 1 out of the 5 repeats fail due to flagged degradations in
> P2pPartitionedPutLongBenchmark:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/20]
> This run saw 4 out of the 5 repeats fail due to flagged degradations in
> P2pPartitionedPutLongBenchmark:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/27]
> In all the above benchmarks, the other failed runs were due to EOFExceptions
> rather than flagged degradations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)