Hi Michael, Thanks for this update. As a newbie it helped me to understand the organization and processes a little bit better.
I don't know how many CS-devs know this but I love this rule (actually the whole book): http://programmer.97things.oreilly.com/wiki/index.php/The_Boy_Scout_Rule I personally, to be honest, am not the kind of guy that walks through lists and looks for issues that could be picked up and done, but if I encounter anything (test, some weird code, design, whatever) that deserves to be improved, analyzed or fixed and I have a little time left, I try to improve or fix it. At that time I am still quite new around here and in process of understanding the whole picture of Cassandras behaviour, code, processes and organization. I hope you can forgive me if I don't perfectly get the point every time right now - but I am eager to learn and improve. Thanks for your patience! 2016-12-04 19:33 GMT+01:00 Michael Shuler <mich...@pbandjelly.org>: > Thanks for your thoughts on testing Apache Cassandra, I share them. > > I just wanted to note that the known_failure() annotations were recently > removed from cassandra-dtest [0], due to lack of annotation removal when > bugs fixed, and the internal webapp that we were using to parse has been > broken for quite some time, with no fix in sight. The webapp was removed > and we dropped all the known_failure() annotations. > > The test-failure JIRA label [1] is what we've been using during test run > triage. Those tickets assigned to 'DS Test Eng' need figuring out if > it's a test problem or Cassandra problem. Typically, the Unassigned > tickets were determined to be possibly a Cassandra issue. If you enjoy > test analysis and fixing them, please, jump in and analyze/fix them! > > [0] https://github.com/riptano/cassandra-dtest/pull/1399 > [1] > https://issues.apache.org/jira/issues/?jql=project%20% > 3D%20CASSANDRA%20AND%20labels%20%3D%20test-failure%20AND% > 20resolution%20%3D%20unresolved > > -- > Kind regards, > Michael Shuler > > On 12/04/2016 02:07 AM, Benjamin Roth wrote: > > Sorry for jumping in so boldly before. > > > > TL;DR: > > > > - I didn't mean to delete every flaky test just like that > > - To improve quality, each failing test has to be analyzed > individually > > for release > > > > More thoughts on that: > > > > I had a closer look on some of the tests tagged as flaky and realized > that > > the situation here is more complex than I thought before. > > Of course I didn't mean to delete all the flaky tests just like that. > Maybe > > I should rephrase it a bit to "If a (flaky) test can't really prove > > something, then it is better not to have it". If a test does prove > > something depends on its intention, its implementation and on how flaky > it > > really is and first of all: Why. > > > > These dtests are maybe blessing and curse at the same time. On the one > hand > > there are things you cannot test with a unit test, so you need them for > > certain cases. On the other hand, dtest do not only test the desired > case. > > > > - They test the test environment (ccm, server hickups) and more or > less > > all components of the CS daemon that are somehow involved as well. > > - This exposes the test to many more error sources than the bare test > > case and that creates of course a lot of "unreliability" in general > and > > causes flaky results. > > - It makes it hard to pin down the failures to a certain cause like > > - Flaky test implementation > > - Flaky bugs in SUT > > - Unreliable test environment > > - Analyzing every failure is a pain. But a simple "retry and skip > over" > > _may_ mask a real problem. > > > > => Difficult situation! > > > > From my own projects and non-CS experience I can tell: > > Flaky tests give me a bad feeling and always leave a certain smell. I've > > also just skipped them with that reason "Yes, I know it's flaky, I don't > > really care about it". But it simply does not feel right. > > > > A real life example from another project: > > Some weeks ago I wrote functional tests to test the integration of > > SeaweedFS as a blob store backend in an image upload process. Test case > was > > roughly to upload an image, check if it exists on both old and new image > > storage, delete it, check it again. The test existed for years. I simply > > added some assertions to check the existance of the uploaded files on the > > new storage. Funnyhow, I must have hit some corner case by that and from > > that moment on, the test was flaky. Simple URL checks started to time out > > from time to time. That made me really curios. To cut a long story short: > > After having checked a whole lot of things, it turned out that not the > test > > was flaky and also not the shiny new storagy, it was the LVS > loadbalancer. > > The loadbalancer dropped connections reproducibly which happened more > > likely with increasing concurrency. Finally we removed LVS completely and > > replaced it by DNS-RR + VRRP, which completely solved the problem and the > > tests ran happily ever after. > > > > Usually there is no pure black and white. > > > > - Sometimes testing whole systems reveals problems you'd never > > have found without them > > - Sometimes they cause false alerts > > - Sometimes, skipping them masks real problems > > - Sometimes it sucks if a false alert blocks your release > > > > If you want to be really safe, you have to analyze every single failure > and > > decide of what kind this failure is or could be and if a retry will prove > > sth or not. At least when you are at a release gate. I think this should > be > > worth it. > > > > There's a reason for this thread and there's a reason why people ask > every > > few days which CS version is production stable. Things have to improve > over > > time. This applies to test implementations, test environments, release > > processes, and so on. One way to do this is to become a little bit > stricter > > (and a bit better) with every release. Making all tests pass at least > once > > before a release should be a rather low hanging fruit. Reducing the total > > number of flaky tests or the "flaky-fail-rate" may be another future > goal. > > > > Btw, the fact of the day: > > I grepped through dtests and found out that roughly 11% of all tests are > > flagged with "known_failure" and roughly 8% of all tests are flagged with > > "flaky". Quite impressive. > > > > > > 2016-12-03 15:52 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>: > > > >> I think it is fair to run a flakey test again. If it is determine it > flaked > >> out due to a conflict with another test or something ephemeral in a long > >> process it is not worth blocking a release. > >> > >> Just deleting it is probably not a good path. > >> > >> I actually enjoy writing fixing, tweeking, tests so pinge offline or > >> whatever. > >> > >> On Saturday, December 3, 2016, Benjamin Roth <benjamin.r...@jaumo.com> > >> wrote: > >> > >>> Excuse me if I jump into an old thread, but from my experience, I have > a > >>> very clear opinion about situations like that as I encountered them > >> before: > >>> > >>> Tests are there to give *certainty*. > >>> *Would you like to pass a crossing with a green light if you cannot be > >> sure > >>> if green really means green?* > >>> Do you want to rely on tests that are green, red, green, red? What if a > >> red > >>> is a real red and you missed it because you simply ignore it because > it's > >>> flaky? > >>> > >>> IMHO there are only 3 options how to deal with broken/red tests: > >>> - Fix the underlying issue > >>> - Fix the test > >>> - Delete the test > >>> > >>> If I cannot trust a test, it is better not to have it at all. Otherwise > >>> people are staring at red lights and start to drive. > >>> > >>> This causes: > >>> - Uncertainty > >>> - Loss of trust > >>> - Confusion > >>> - More work > >>> - *Less quality* > >>> > >>> Just as an example: > >>> Few days ago I created a patch. Then I ran the utest and 1 test failed. > >>> Hmmm, did I break it? I had to check it twice by checking out the > former > >>> state, running the tests again just to recognize that it wasn't me who > >> made > >>> it fail. That's annoying. > >>> > >>> Sorry again, I'm rather new here but what I just read reminded me much > of > >>> situations I have been in years ago. > >>> So: +1, John > >>> > >>> 2016-12-03 7:48 GMT+01:00 sankalp kohli <kohlisank...@gmail.com > >>> <javascript:;>>: > >>> > >>>> Hi, > >>>> I dont see any any update on this thread. We will go ahead and > make > >>>> Dtest a blocker for cutting releasing for anything after 3.10. > >>>> > >>>> Please respond if anyone has an objection to this. > >>>> > >>>> Thanks, > >>>> Sankalp > >>>> > >>>> > >>>> > >>>> On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jmcken...@apache.org > >>> <javascript:;>> > >>>> wrote: > >>>> > >>>>> Caveat: I'm strongly in favor of us blocking a release on a non-green > >>>> test > >>>>> board of either utest or dtest. > >>>>> > >>>>> > >>>>>> put something in prod which is known to be broken in obvious ways > >>>>> > >>>>> In my experience the majority of fixes are actually shoring up > >>>> low-quality > >>>>> / flaky tests or fixing tests that have been invalidated by a commit > >>> but > >>>> do > >>>>> not indicate an underlying bug. Inferring "tests are failing so we > >> know > >>>>> we're asking people to put things in prod that are broken in obvious > >>>> ways" > >>>>> is hyperbolic. A more correct statement would be: "Tests are failing > >> so > >>>> we > >>>>> know we're shipping with a test that's failing" which is not helpful. > >>>>> > >>>>> Our signal to noise ratio with tests has been very poor historically; > >>>> we've > >>>>> been trying to address that through aggressive triage and assigning > >> out > >>>>> test failures however we need far more active and widespread > >> community > >>>>> involvement if we want to truly *fix* this problem long-term. > >>>>> > >>>>> On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <j...@jonhaddad.com > >>> <javascript:;>> > >>>>> wrote: > >>>>> > >>>>>> +1. Kind of silly to put advise people to put something in prod > >>> which > >>>> is > >>>>>> known to be broken in obvious ways > >>>>>> > >>>>>> On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli < > >>> kohlisank...@gmail.com <javascript:;> > >>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> We should not cut a releases if Dtest are not passing. I > >> won't > >>>>> block > >>>>>>> 3.10 on this since we are just discussing this. > >>>>>>> > >>>>>>> Please provide feedback on this. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Sankalp > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Benjamin Roth > >>> Prokurist > >>> > >>> Jaumo GmbH · www.jaumo.com > >>> Wehrstraße 46 · 73035 Göppingen · Germany > >>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1 > >>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer > >>> > >> > >> > >> -- > >> Sorry this was sent from mobile. Will do less grammar and spell check > than > >> usual. > >> > > > > > > > > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer