I agree with Ron.
I think improving the framework with a configurable number of retries on
some tests will yield the highest ROI in terms of passing builds.

On Fri, Mar 8, 2019 at 10:48 PM Ron Dagostino <rndg...@gmail.com> wrote:

> It's a classic problem: you can't string N things together serially and
> expect high reliability.  5,000 tests in a row isn't going to give you a
> bunch of 9's.  It feels to me that the test frameworks themselves should
> support a more robust model -- like a way to tag a test as "retry me up to
> N times before you really consider me a failure" or something like that.
>
> Ron
>
> On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > > We internally have an improvement for a half a year now which reruns
> the
> > flaky test classes at the end of the test gradle task, lets you know that
> > they were rerun and probably flaky. It fails the build only if the second
> > run of the test class was also unsuccessful. I think it works pretty
> good,
> > we mostly have green builds. If there is interest, I can try to
> contribute
> > that.
> >
> > That does sound very intriguing. Does it rerun the test classes that
> failed
> > or some known, marked classes? If it is the former, I can see a lot of
> > value in having that automated in our PR builds. I wonder what others
> think
> > of this
> >
> > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com>
> > wrote:
> >
> > > Hey All,
> > >
> > > Thanks for the loads of ideas.
> > >
> > > @Stanislav, @Sonke
> > > I probably left it out from my email but I really imagined this as a
> > > case-by-case basis change. If we think that it wouldn't cause problems,
> > > then it might be applied. That way we'd limit the blast radius
> somewhat.
> > > The 1 hour gain is really just the most optimistic scenario, I'm almost
> > > sure that not every test could be transformed to use a common cluster.
> > > We internally have an improvement for a half a year now which reruns
> the
> > > flaky test classes at the end of the test gradle task, lets you know
> that
> > > they were rerun and probably flaky. It fails the build only if the
> second
> > > run of the test class was also unsuccessful. I think it works pretty
> > good,
> > > we mostly have green builds. If there is interest, I can try to
> > contribute
> > > that.
> > >
> > > >I am also extremely annoyed at times by the amount of coffee I have to
> > > drink before tests finish
> > > Just please don't get a heart attack :)
> > >
> > > @Ron, @Colin
> > > You bring up a very good point that it is easier and frees up more
> > > resources if we just run change specific tests and it's good to know
> > that a
> > > similar solution (meaning using a shared resource for testing) have
> > failed
> > > elsewhere. I second Ron on the test categorization though, although as
> a
> > > first attempt I think using a flaky retry + running only the necessary
> > > tests would help in both time saving and effectiveness. Also it would
> be
> > > easier to achieve.
> > >
> > > @Ismael
> > > Yea, it'd be interesting to profile the startup/shutdown, I've never
> done
> > > that. Perhaps I'll set some time apart for that :). It's definitely
> true
> > > though that if we see a significant delay there we wouldn't just
> improve
> > > the efficiency of the tests but also customer experience.
> > >
> > > Best,
> > > Viktor
> > >
> > >
> > >
> > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma <isma...@gmail.com> wrote:
> > >
> > > > It's an idea that has come up before and worth exploring eventually.
> > > > However, I'd first try to optimize the server startup/shutdown
> process.
> > > If
> > > > we measure where the time is going, maybe some opportunities will
> > present
> > > > themselves.
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I've been observing lately that unit tests usually take 2.5 hours
> to
> > > run
> > > > > and a very big portion of these are the core tests where a new
> > cluster
> > > is
> > > > > spun up for every test. This takes most of the time. I ran a test
> > > > > (TopicCommandWithAdminClient with 38 test inside) through the
> > profiler
> > > > and
> > > > > it shows for instance that running the whole class itself took 10
> > > minutes
> > > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> > That's a
> > > > > 100% overhead. Without profiler the whole class takes 7 minutes and
> > 48
> > > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > > > bigger
> > > > > test though, most of them won't take this much.
> > > > > There are 74 classes that implement KafkaServerTestHarness and just
> > > > running
> > > > > :core:integrationTest takes almost 2 hours.
> > > > >
> > > > > I think we could greatly speed up these integration tests by just
> > > > creating
> > > > > the cluster once per class and perform the tests on separate
> > methods. I
> > > > > know that this a little bit contradicts to the principle that tests
> > > > should
> > > > > be independent but it seems like recreating clusters for each is a
> > very
> > > > > expensive operation. Also if the tests are acting on different
> > > resources
> > > > > (different topics, etc.) then it might not hurt their independence.
> > > There
> > > > > might be cases of course where this is not possible but I think
> there
> > > > could
> > > > > be a lot where it is.
> > > > >
> > > > > In the optimal case we could cut the testing time back by
> > approximately
> > > > an
> > > > > hour. This would save resources and give quicker feedback for PR
> > > builds.
> > > > >
> > > > > What are your thoughts?
> > > > > Has anyone thought about this or were there any attempts made?
> > > > >
> > > > > Best,
> > > > > Viktor
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
Best,
Stanislav

Reply via email to