My adventure to figure out why we have significantly worse performance on 8.9.0 compared to 8.3.1 continues.
As mentioned in a previous thread, when running tests we're seeing around 90% success rates with 8.9 compared to around 98% with 8.3.1 Running tests at 1min for better granularity, we saw that the degraded performance was happening at the times replication was running. Disabling replication gave us 100% success rates for both versions at last. However, this obviously isn't sustainable. And didn't explain the disparity between versions. On further testing, it was apparent that when replication was running, although both versions suffered, 8.9 suffered much longer: Even though replication might only take seconds to complete, was never degraded for more than 2min at worst, where 8.9 could suffer for over 5mins. As an example, here are the results of running tests on both versions, with replication disabled but manually triggering it twice, after waiting for both instances to regain 100%. You can see that 8.3.1 suffers two fallouts for a single test run, dropping to 78% when replication occurred. However, 8.9.0 saw drops down to 70% and took 7mins to regain full capacity. Any ideas on where to look for the cause of this disparity would be most appreciated!4 Thu Oct 21 14:36:03 UTC 2021 | Thu Oct 21 14:36:04 UTC 2021 Success [ratio] 100.00% | Success [ratio] 100.00% Thu Oct 21 14:37:04 UTC 2021 | Thu Oct 21 14:37:05 UTC 2021 <-- replication --> | <-- replication --> Success [ratio] 78.17% | Success [ratio] 76.00% Thu Oct 21 14:38:26 UTC 2021 | Thu Oct 21 14:38:37 UTC 2021 Success [ratio] 100.00% | Success [ratio] 70.00% Thu Oct 21 14:39:28 UTC 2021 | Success [ratio] 100.00% | Thu Oct 21 14:40:29 UTC 2021 | Thu Oct 21 14:40:07 UTC 2021 Success [ratio] 100.00% | Success [ratio] 70.00% Thu Oct 21 14:41:31 UTC 2021 | Thu Oct 21 14:41:38 UTC 2021 Success [ratio] 100.00% | Success [ratio] 70.00% Thu Oct 21 14:42:32 UTC 2021 | Success [ratio] 100.00% | Thu Oct 21 14:43:34 UTC 2021 | Thu Oct 21 14:43:09 UTC 2021 Success [ratio] 100.00% | Success [ratio] 78.17% Thu Oct 21 14:44:35 UTC 2021 | Thu Oct 21 14:44:27 UTC 2021 Success [ratio] 100.00% | Success [ratio] 100.00% Thu Oct 21 14:45:37 UTC 2021 | Thu Oct 21 14:45:29 UTC 2021 Success [ratio] 100.00% | Success [ratio] 100.00% Thu Oct 21 14:46:38 UTC 2021 | Thu Oct 21 14:46:31 UTC 2021 <-- replication --> | <-- replication --> Success [ratio] 78.50% | Success [ratio] 91.33% Thu Oct 21 14:48:01 UTC 2021 | Thu Oct 21 14:48:02 UTC 2021 Success [ratio] 100.00% | Success [ratio] 70.00% Thu Oct 21 14:49:03 UTC 2021 | Thu Oct 21 14:49:33 UTC 2021 Success [ratio] 100.00% | Success [ratio] 70.00% Thu Oct 21 14:50:04 UTC 2021 | Success [ratio] 100.00% | Thu Oct 21 14:51:07 UTC 2021 | Thu Oct 21 14:51:04 UTC 2021 Success [ratio] 100.00% | Success [ratio] 86.83% Thu Oct 21 14:52:08 UTC 2021 | Thu Oct 21 14:52:05 UTC 2021 Success [ratio] 100.00% | Success [ratio] 100.00% Thu Oct 21 14:53:10 UTC 2021 | Thu Oct 21 14:53:07 UTC 2021 Success [ratio] 100.00% | Success [ratio] 100.00% (blank lines for 8.9 just to keep times in sync - test cycles were reliably 1min but waiting for connections to close resulted in delays between cycles)