12rcu commented on PR #5158: URL: https://github.com/apache/ignite-3/pull/5158#issuecomment-2629451092
## About the changes This PR is not intended to fix the flakyness of this test per se, but to mitigate the timeout issues. I have implemented 3 options in this PR to mitigate this. ### Repeat the test I changed the test annotation to a repeated test so that the test can fail once and still pass, the test still has to pass at least once. One drawback is that this test will always run twice. ### Increase Electiontimeout One source of flakiness during testing was the election timeout, as the following times through the node changed when this happened. The election timeout was increased after the first leader election, as it required a timeout. A drawback is that it feels a bit hacky to just reset the timeout of all nodes. Another drawback is that the leader might actually need a timeout to get reelected (which would fail the test), so I added a 25 second timeout to the test. ### Increase wait for timeout assertions Another source of flakyness was the busy wait for the assertion on the next times of the nodes. I increased this timeout by 2 seconds. ## Introduction of assertwaitForCondition() While testing this was a pretty nice addition to actually see what values this method was receiving and log it when it failed. ## Other options for logging or preventing flakiness I thought about registering a new event handler that would track the timeout of the nodes and dynamically raise the assertion of the getOnStartFollowingTimes() method or fail the test if the timeout was triggered. I decided not to do this as it might not help with fault-tolerance and would still increase complexity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org