Re: Failed Dtest will block cutting releases

Benjamin Roth Sun, 04 Dec 2016 11:45:31 -0800

Hi Michael,

Thanks for this update. As a newbie it helped me to understand the
organization and processes a little bit better.


I don't know how many CS-devs know this but I love this rule (actually the
whole book):
http://programmer.97things.oreilly.com/wiki/index.php/The_Boy_Scout_Rule

I personally, to be honest, am not the kind of guy that walks through lists
and looks for issues that could be picked up and done, but if I encounter
anything (test, some weird code, design, whatever) that deserves to be
improved, analyzed or fixed and I have a little time left, I try to improve
or fix it.

At that time I am still quite new around here and in process of
understanding the whole picture of Cassandras behaviour, code, processes
and organization. I hope you can forgive me if I don't perfectly get the
point every time right now - but I am eager to learn and improve.

Thanks for your patience!

2016-12-04 19:33 GMT+01:00 Michael Shuler <mich...@pbandjelly.org>:

> Thanks for your thoughts on testing Apache Cassandra, I share them.
>
> I just wanted to note that the known_failure() annotations were recently
> removed from cassandra-dtest [0], due to lack of annotation removal when
> bugs fixed, and the internal webapp that we were using to parse has been
> broken for quite some time, with no fix in sight. The webapp was removed
> and we dropped all the known_failure() annotations.
>
> The test-failure JIRA label [1] is what we've been using during test run
> triage. Those tickets assigned to 'DS Test Eng' need figuring out if
> it's a test problem or Cassandra problem. Typically, the Unassigned
> tickets were determined to be possibly a Cassandra issue. If you enjoy
> test analysis and fixing them, please, jump in and analyze/fix them!
>
> [0] https://github.com/riptano/cassandra-dtest/pull/1399
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20CASSANDRA%20AND%20labels%20%3D%20test-failure%20AND%
> 20resolution%20%3D%20unresolved
>
> --
> Kind regards,
> Michael Shuler
>
> On 12/04/2016 02:07 AM, Benjamin Roth wrote:
> > Sorry for jumping in so boldly before.
> >
> > TL;DR:
> >
> >    - I didn't mean to delete every flaky test just like that
> >    - To improve quality, each failing test has to be analyzed
> individually
> >    for release
> >
> > More thoughts on that:
> >
> > I had a closer look on some of the tests tagged as flaky and realized
> that
> > the situation here is more complex than I thought before.
> > Of course I didn't mean to delete all the flaky tests just like that.
> Maybe
> > I should rephrase it a bit to "If a (flaky) test can't really prove
> > something, then it is better not to have it". If a test does prove
> > something depends on its intention, its implementation and on how flaky
> it
> > really is and first of all: Why.
> >
> > These dtests are maybe blessing and curse at the same time. On the one
> hand
> > there are things you cannot test with a unit test, so you need them for
> > certain cases. On the other hand, dtest do not only test the desired
> case.
> >
> >    - They test the test environment (ccm, server hickups) and more or
> less
> >    all components of the CS daemon that are somehow involved as well.
> >    - This exposes the test to many more error sources than the bare test
> >    case and that creates of course a lot of "unreliability" in general
> and
> >    causes flaky results.
> >    - It makes it hard to pin down the failures to a certain cause like
> >       - Flaky test implementation
> >       - Flaky bugs in SUT
> >       - Unreliable test environment
> >    - Analyzing every failure is a pain. But a simple "retry and skip
> over"
> >    _may_ mask a real problem.
> >
> > => Difficult situation!
> >
> > From my own projects and non-CS experience I can tell:
> > Flaky tests give me a bad feeling and always leave a certain smell. I've
> > also just skipped them with that reason "Yes, I know it's flaky, I don't
> > really care about it". But it simply does not feel right.
> >
> > A real life example from another project:
> > Some weeks ago I wrote functional tests to test the integration of
> > SeaweedFS as a blob store backend in an image upload process. Test case
> was
> > roughly to upload an image, check if it exists on both old and new image
> > storage, delete it, check it again. The test existed for years. I simply
> > added some assertions to check the existance of the uploaded files on the
> > new storage. Funnyhow, I must have hit some corner case by that and from
> > that moment on, the test was flaky. Simple URL checks started to time out
> > from time to time. That made me really curios. To cut a long story short:
> > After having checked a whole lot of things, it turned out that not the
> test
> > was flaky and also not the shiny new storagy, it was the LVS
> loadbalancer.
> > The loadbalancer dropped connections reproducibly which happened more
> > likely with increasing concurrency. Finally we removed LVS completely and
> > replaced it by DNS-RR + VRRP, which completely solved the problem and the
> > tests ran happily ever after.
> >
> > Usually there is no pure black and white.
> >
> >    - Sometimes testing whole systems reveals problems you'd never
> >    have found without them
> >    - Sometimes they cause false alerts
> >    - Sometimes, skipping them masks real problems
> >    - Sometimes it sucks if a false alert blocks your release
> >
> > If you want to be really safe, you have to analyze every single failure
> and
> > decide of what kind this failure is or could be and if a retry will prove
> > sth or not. At least when you are at a release gate. I think this should
> be
> > worth it.
> >
> > There's a reason for this thread and there's a reason why people ask
> every
> > few days which CS version is production stable. Things have to improve
> over
> > time. This applies to test implementations, test environments, release
> > processes, and so on. One way to do this is to become a little bit
> stricter
> > (and a bit better) with every release. Making all tests pass at least
> once
> > before a release should be a rather low hanging fruit. Reducing the total
> > number of flaky tests or the "flaky-fail-rate" may be another future
> goal.
> >
> > Btw, the fact of the day:
> > I grepped through dtests and found out that roughly 11% of all tests are
> > flagged with "known_failure" and roughly 8% of all tests are flagged with
> > "flaky". Quite impressive.
> >
> >
> > 2016-12-03 15:52 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:
> >
> >> I think it is fair to run a flakey test again. If it is determine it
> flaked
> >> out due to a conflict with another test or something ephemeral in a long
> >> process it is not worth blocking a release.
> >>
> >> Just deleting it is probably not a good path.
> >>
> >> I actually enjoy writing fixing, tweeking, tests so pinge offline or
> >> whatever.
> >>
> >> On Saturday, December 3, 2016, Benjamin Roth <benjamin.r...@jaumo.com>
> >> wrote:
> >>
> >>> Excuse me if I jump into an old thread, but from my experience, I have
> a
> >>> very clear opinion about situations like that as I encountered them
> >> before:
> >>>
> >>> Tests are there to give *certainty*.
> >>> *Would you like to pass a crossing with a green light if you cannot be
> >> sure
> >>> if green really means green?*
> >>> Do you want to rely on tests that are green, red, green, red? What if a
> >> red
> >>> is a real red and you missed it because you simply ignore it because
> it's
> >>> flaky?
> >>>
> >>> IMHO there are only 3 options how to deal with broken/red tests:
> >>> - Fix the underlying issue
> >>> - Fix the test
> >>> - Delete the test
> >>>
> >>> If I cannot trust a test, it is better not to have it at all. Otherwise
> >>> people are staring at red lights and start to drive.
> >>>
> >>> This causes:
> >>> - Uncertainty
> >>> - Loss of trust
> >>> - Confusion
> >>> - More work
> >>> - *Less quality*
> >>>
> >>> Just as an example:
> >>> Few days ago I created a patch. Then I ran the utest and 1 test failed.
> >>> Hmmm, did I break it? I had to check it twice by checking out the
> former
> >>> state, running the tests again just to recognize that it wasn't me who
> >> made
> >>> it fail. That's annoying.
> >>>
> >>> Sorry again, I'm rather new here but what I just read reminded me much
> of
> >>> situations I have been in years ago.
> >>> So: +1, John
> >>>
> >>> 2016-12-03 7:48 GMT+01:00 sankalp kohli <kohlisank...@gmail.com
> >>> <javascript:;>>:
> >>>
> >>>> Hi,
> >>>>     I dont see any any update on this thread. We will go ahead and
> make
> >>>> Dtest a blocker for cutting releasing for anything after 3.10.
> >>>>
> >>>> Please respond if anyone has an objection to this.
> >>>>
> >>>> Thanks,
> >>>> Sankalp
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jmcken...@apache.org
> >>> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>>> Caveat: I'm strongly in favor of us blocking a release on a non-green
> >>>> test
> >>>>> board of either utest or dtest.
> >>>>>
> >>>>>
> >>>>>> put something in prod which is known to be broken in obvious ways
> >>>>>
> >>>>> In my experience the majority of fixes are actually shoring up
> >>>> low-quality
> >>>>> / flaky tests or fixing tests that have been invalidated by a commit
> >>> but
> >>>> do
> >>>>> not indicate an underlying bug. Inferring "tests are failing so we
> >> know
> >>>>> we're asking people to put things in prod that are broken in obvious
> >>>> ways"
> >>>>> is hyperbolic. A more correct statement would be: "Tests are failing
> >> so
> >>>> we
> >>>>> know we're shipping with a test that's failing" which is not helpful.
> >>>>>
> >>>>> Our signal to noise ratio with tests has been very poor historically;
> >>>> we've
> >>>>> been trying to address that through aggressive triage and assigning
> >> out
> >>>>> test failures however we need far more active and widespread
> >> community
> >>>>> involvement if we want to truly *fix* this problem long-term.
> >>>>>
> >>>>> On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <j...@jonhaddad.com
> >>> <javascript:;>>
> >>>>> wrote:
> >>>>>
> >>>>>> +1.  Kind of silly to put advise people to put something in prod
> >>> which
> >>>> is
> >>>>>> known to be broken in obvious ways
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
> >>> kohlisank...@gmail.com <javascript:;>
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>     We should not cut a releases if Dtest are not passing. I
> >> won't
> >>>>> block
> >>>>>>> 3.10 on this since we are just discussing this.
> >>>>>>>
> >>>>>>> Please provide feedback on this.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Sankalp
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Benjamin Roth
> >>> Prokurist
> >>>
> >>> Jaumo GmbH · www.jaumo.com
> >>> Wehrstraße 46 · 73035 Göppingen · Germany
> >>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> >>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
> >>>
> >>
> >>
> >> --
> >> Sorry this was sent from mobile. Will do less grammar and spell check
> than
> >> usual.
> >>
> >
> >
> >
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Failed Dtest will block cutting releases

Reply via email to