Re: [DISUSS] Improve unit test stability

Lari Hotari Wed, 09 Mar 2022 05:19:47 -0800

> However, I will point out that this change is not the same as the
> initially proposed change to increase the number of times a new test
> is run (point (1) in the original email for this thread). I am not


that's true. Increasing the number of times to run a test method is executed 
could be a breaking change. TestNG supports running each test method multiple 
times (invocationCount parameter on the Test annotation). A lot of the tests 
are stateful and don't support running the test method multiple times without 
resetting state between calls. For test retries, this problem has been solved 
by the TestRetrySupport base class. That was added by PR 
https://github.com/apache/pulsar/pull/9823 and it describes the problem and the 
solution for test retries. 
Running each test class that has been modified multiple times would be a 
solution that would work, but that's not available out-of-the-box in TestNG.
This is the reason why I didn't add the solution to run the test multiple 
times. 
I haven't found a way how to configure TestNG to run certain test classes 
multiple times.  As mentioned above, the built-in solution executes a test 
method multiple times and that might not be supported by the test class. It's 
possible to solve that problem, but it would require some effort.

-Lari

On 2022/03/08 20:26:49 Michael Marshall wrote:
> This is a great discussion, thanks for starting it.
> 
> I support turning off retries for new tests, as Lari suggests.
> However, I will point out that this change is not the same as the
> initially proposed change to increase the number of times a new test
> is run (point (1) in the original email for this thread). I am not
> sure if we need to run new tests extra times to test for flakiness,
> but I definitely agree that we should not default to retrying newly
> added tests.
> 
> Alternatively, would it be possible to only retry tests that are
> annotated as "flaky"? This would exclude all new tests, by default,
> and it'd make it clearer which tests need to be fixed.
> 
> > I see. Now we retry all tests by default. What's the reason?
> 
> Test retries are enabled precisely because some tests are flaky.
> 
> Thanks,
> Michael
> 
> 
> On Tue, Mar 8, 2022 at 7:57 AM Baozi <wudixiaolong...@icloud.com.invalid> 
> wrote:
> >
> > Good job!
> >
> > I see. Now we retry all tests by default. What's the reason?
> >
> > The process I understand is:
> > We don't need to retry the unit test. Once the test fails, let the job fail.
> > The difference is that new or modified tests need to be tested many times. 
> > As long as there is one failure, the job will fail.
> >
> > > 2022年3月8日 21:0956，Lari Hotari <lhot...@apache.org> 写道：
> > >
> > > Here's a PR to disable tests retries for new and modified tests:
> > > https://github.com/apache/pulsar/pull/14607
> > >
> > > Please review and provide feedback.
> > >
> > > -Lari
> > >
> > > On 2022/03/08 11:24:13 Lari Hotari wrote:
> > >> I think this is a good idea. In CI, the tests will be retried multiple
> > >> times at multiple levels: at the test class / test method level and at 
> > >> the
> > >> build level.
> > >>
> > >> I'd like to get rid of test retries completely, since that is the reason
> > >> why the flakiness problem gets worse over time. This was also covered in 
> > >> a
> > >> proposal that was made about a year ago:
> > >> https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit
> > >> .
> > >> "since the current build tolerates flaky tests, the current test retry
> > >> solution leads to more flaky tests being added to the code base over 
> > >> time."
> > >>
> > >> It would be possible to make our CI behave in a way that test method 
> > >> level
> > >> retries are disabled for test classes that are modified or added in a PR.
> > >>
> > >> In GitHub Actions workflows we already have a solution for detecting the
> > >> changed files.
> > >>
> > >>      - name: Detect changed files
> > >>        id:   changes
> > >>        uses: apache/pulsar-test-infra/paths-filter@master
> > >>        with:
> > >>          filters: .github/changes-filter.yaml
> > >>
> > >> It would be possible to make the build work in a way that it passes the
> > >> list of changed tests in a system property or environment variable. This
> > >> could be passed to  org.apache.pulsar.tests.RetryAnalyzer which could be
> > >> modified to disable retries for the tests listed in the parameter that 
> > >> was
> > >> passed via a system property or environment value.
> > >>
> > >> Besides this, it would be necessary to disable the build level test retry
> > >> loop (the build/retry.sh solution) when any test in modified in the 
> > >> module.
> > >> It would be possible to check the failures and skip retrying if one of 
> > >> the
> > >> failures is one of the changed tests.
> > >>
> > >> -Lari
> > >>
> > >>
> > >> On Mon, Mar 7, 2022 at 5:12 AM 包子 <wudixiaolong...@icloud.com.invalid>
> > >> wrote:
> > >>
> > >>> Hi, I want to discuss how to improve the stability of unit testing.I 
> > >>> found
> > >>> that most flaky unit tests can be reproduc locally and only need to be
> > >>> executed a few more times.
> > >>>
> > >>> Can we add the following mandatory constraints to ensure the stability 
> > >>> of
> > >>> unit testing of code?
> > >>> 1. If new / modified unit tests are included in PR, they need to be run
> > >>> continuously on CI for more than n times.
> > >>> 2. We can write scripts and parse git change records to find new or
> > >>> modified unit tests.
> > >>>
> > >>>
> > >>
> >
>

Re: [DISUSS] Improve unit test stability

Reply via email to