Re: Failed Dtest will block cutting releases

Benjamin Roth Sun, 04 Dec 2016 00:07:50 -0800

Sorry for jumping in so boldly before.

TL;DR:

   - I didn't mean to delete every flaky test just like that
   - To improve quality, each failing test has to be analyzed individually
   for release

More thoughts on that:

I had a closer look on some of the tests tagged as flaky and realized that
the situation here is more complex than I thought before.
Of course I didn't mean to delete all the flaky tests just like that. Maybe
I should rephrase it a bit to "If a (flaky) test can't really prove
something, then it is better not to have it". If a test does prove
something depends on its intention, its implementation and on how flaky it
really is and first of all: Why.

These dtests are maybe blessing and curse at the same time. On the one hand
there are things you cannot test with a unit test, so you need them for
certain cases. On the other hand, dtest do not only test the desired case.

   - They test the test environment (ccm, server hickups) and more or less
   all components of the CS daemon that are somehow involved as well.
   - This exposes the test to many more error sources than the bare test
   case and that creates of course a lot of "unreliability" in general and
   causes flaky results.
   - It makes it hard to pin down the failures to a certain cause like
      - Flaky test implementation
      - Flaky bugs in SUT
      - Unreliable test environment
   - Analyzing every failure is a pain. But a simple "retry and skip over"
   _may_ mask a real problem.

=> Difficult situation!

>From my own projects and non-CS experience I can tell:
Flaky tests give me a bad feeling and always leave a certain smell. I've
also just skipped them with that reason "Yes, I know it's flaky, I don't
really care about it". But it simply does not feel right.

A real life example from another project:
Some weeks ago I wrote functional tests to test the integration of
SeaweedFS as a blob store backend in an image upload process. Test case was
roughly to upload an image, check if it exists on both old and new image
storage, delete it, check it again. The test existed for years. I simply
added some assertions to check the existance of the uploaded files on the
new storage. Funnyhow, I must have hit some corner case by that and from
that moment on, the test was flaky. Simple URL checks started to time out
from time to time. That made me really curios. To cut a long story short:
After having checked a whole lot of things, it turned out that not the test
was flaky and also not the shiny new storagy, it was the LVS loadbalancer.
The loadbalancer dropped connections reproducibly which happened more
likely with increasing concurrency. Finally we removed LVS completely and
replaced it by DNS-RR + VRRP, which completely solved the problem and the
tests ran happily ever after.

Usually there is no pure black and white.

   - Sometimes testing whole systems reveals problems you'd never
   have found without them
   - Sometimes they cause false alerts
   - Sometimes, skipping them masks real problems
   - Sometimes it sucks if a false alert blocks your release

If you want to be really safe, you have to analyze every single failure and
decide of what kind this failure is or could be and if a retry will prove
sth or not. At least when you are at a release gate. I think this should be
worth it.

There's a reason for this thread and there's a reason why people ask every
few days which CS version is production stable. Things have to improve over
time. This applies to test implementations, test environments, release
processes, and so on. One way to do this is to become a little bit stricter
(and a bit better) with every release. Making all tests pass at least once
before a release should be a rather low hanging fruit. Reducing the total
number of flaky tests or the "flaky-fail-rate" may be another future goal.

Btw, the fact of the day:
I grepped through dtests and found out that roughly 11% of all tests are
flagged with "known_failure" and roughly 8% of all tests are flagged with
"flaky". Quite impressive.

2016-12-03 15:52 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:

> I think it is fair to run a flakey test again. If it is determine it flaked
> out due to a conflict with another test or something ephemeral in a long
> process it is not worth blocking a release.
>
> Just deleting it is probably not a good path.
>
> I actually enjoy writing fixing, tweeking, tests so pinge offline or
> whatever.
>
> On Saturday, December 3, 2016, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> > Excuse me if I jump into an old thread, but from my experience, I have a
> > very clear opinion about situations like that as I encountered them
> before:
> >
> > Tests are there to give *certainty*.
> > *Would you like to pass a crossing with a green light if you cannot be
> sure
> > if green really means green?*
> > Do you want to rely on tests that are green, red, green, red? What if a
> red
> > is a real red and you missed it because you simply ignore it because it's
> > flaky?
> >
> > IMHO there are only 3 options how to deal with broken/red tests:
> > - Fix the underlying issue
> > - Fix the test
> > - Delete the test
> >
> > If I cannot trust a test, it is better not to have it at all. Otherwise
> > people are staring at red lights and start to drive.
> >
> > This causes:
> > - Uncertainty
> > - Loss of trust
> > - Confusion
> > - More work
> > - *Less quality*
> >
> > Just as an example:
> > Few days ago I created a patch. Then I ran the utest and 1 test failed.
> > Hmmm, did I break it? I had to check it twice by checking out the former
> > state, running the tests again just to recognize that it wasn't me who
> made
> > it fail. That's annoying.
> >
> > Sorry again, I'm rather new here but what I just read reminded me much of
> > situations I have been in years ago.
> > So: +1, John
> >
> > 2016-12-03 7:48 GMT+01:00 sankalp kohli <kohlisank...@gmail.com
> > <javascript:;>>:
> >
> > > Hi,
> > >     I dont see any any update on this thread. We will go ahead and make
> > > Dtest a blocker for cutting releasing for anything after 3.10.
> > >
> > > Please respond if anyone has an objection to this.
> > >
> > > Thanks,
> > > Sankalp
> > >
> > >
> > >
> > > On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jmcken...@apache.org
> > <javascript:;>>
> > > wrote:
> > >
> > > > Caveat: I'm strongly in favor of us blocking a release on a non-green
> > > test
> > > > board of either utest or dtest.
> > > >
> > > >
> > > > > put something in prod which is known to be broken in obvious ways
> > > >
> > > > In my experience the majority of fixes are actually shoring up
> > > low-quality
> > > > / flaky tests or fixing tests that have been invalidated by a commit
> > but
> > > do
> > > > not indicate an underlying bug. Inferring "tests are failing so we
> know
> > > > we're asking people to put things in prod that are broken in obvious
> > > ways"
> > > > is hyperbolic. A more correct statement would be: "Tests are failing
> so
> > > we
> > > > know we're shipping with a test that's failing" which is not helpful.
> > > >
> > > > Our signal to noise ratio with tests has been very poor historically;
> > > we've
> > > > been trying to address that through aggressive triage and assigning
> out
> > > > test failures however we need far more active and widespread
> community
> > > > involvement if we want to truly *fix* this problem long-term.
> > > >
> > > > On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <j...@jonhaddad.com
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > +1.  Kind of silly to put advise people to put something in prod
> > which
> > > is
> > > > > known to be broken in obvious ways
> > > > >
> > > > > On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
> > kohlisank...@gmail.com <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >     We should not cut a releases if Dtest are not passing. I
> won't
> > > > block
> > > > > > 3.10 on this since we are just discussing this.
> > > > > >
> > > > > > Please provide feedback on this.
> > > > > >
> > > > > > Thanks,
> > > > > > Sankalp
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Benjamin Roth
> > Prokurist
> >
> > Jaumo GmbH · www.jaumo.com
> > Wehrstraße 46 · 73035 Göppingen · Germany
> > Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
> >
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Failed Dtest will block cutting releases

Reply via email to