This is a great discussion, thanks for starting it. I support turning off retries for new tests, as Lari suggests. However, I will point out that this change is not the same as the initially proposed change to increase the number of times a new test is run (point (1) in the original email for this thread). I am not sure if we need to run new tests extra times to test for flakiness, but I definitely agree that we should not default to retrying newly added tests.
Alternatively, would it be possible to only retry tests that are annotated as "flaky"? This would exclude all new tests, by default, and it'd make it clearer which tests need to be fixed. > I see. Now we retry all tests by default. What's the reason? Test retries are enabled precisely because some tests are flaky. Thanks, Michael On Tue, Mar 8, 2022 at 7:57 AM Baozi <wudixiaolong...@icloud.com.invalid> wrote: > > Good job! > > I see. Now we retry all tests by default. What's the reason? > > The process I understand is: > We don't need to retry the unit test. Once the test fails, let the job fail. > The difference is that new or modified tests need to be tested many times. As > long as there is one failure, the job will fail. > > > 2022年3月8日 21:0956,Lari Hotari <lhot...@apache.org> 写道: > > > > Here's a PR to disable tests retries for new and modified tests: > > https://github.com/apache/pulsar/pull/14607 > > > > Please review and provide feedback. > > > > -Lari > > > > On 2022/03/08 11:24:13 Lari Hotari wrote: > >> I think this is a good idea. In CI, the tests will be retried multiple > >> times at multiple levels: at the test class / test method level and at the > >> build level. > >> > >> I'd like to get rid of test retries completely, since that is the reason > >> why the flakiness problem gets worse over time. This was also covered in a > >> proposal that was made about a year ago: > >> https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit > >> . > >> "since the current build tolerates flaky tests, the current test retry > >> solution leads to more flaky tests being added to the code base over time." > >> > >> It would be possible to make our CI behave in a way that test method level > >> retries are disabled for test classes that are modified or added in a PR. > >> > >> In GitHub Actions workflows we already have a solution for detecting the > >> changed files. > >> > >> - name: Detect changed files > >> id: changes > >> uses: apache/pulsar-test-infra/paths-filter@master > >> with: > >> filters: .github/changes-filter.yaml > >> > >> It would be possible to make the build work in a way that it passes the > >> list of changed tests in a system property or environment variable. This > >> could be passed to org.apache.pulsar.tests.RetryAnalyzer which could be > >> modified to disable retries for the tests listed in the parameter that was > >> passed via a system property or environment value. > >> > >> Besides this, it would be necessary to disable the build level test retry > >> loop (the build/retry.sh solution) when any test in modified in the module. > >> It would be possible to check the failures and skip retrying if one of the > >> failures is one of the changed tests. > >> > >> -Lari > >> > >> > >> On Mon, Mar 7, 2022 at 5:12 AM 包子 <wudixiaolong...@icloud.com.invalid> > >> wrote: > >> > >>> Hi, I want to discuss how to improve the stability of unit testing.I found > >>> that most flaky unit tests can be reproduc locally and only need to be > >>> executed a few more times. > >>> > >>> Can we add the following mandatory constraints to ensure the stability of > >>> unit testing of code? > >>> 1. If new / modified unit tests are included in PR, they need to be run > >>> continuously on CI for more than n times. > >>> 2. We can write scripts and parse git change records to find new or > >>> modified unit tests. > >>> > >>> > >> >