Re: [Twisted-Python] Some comments regarding #5190 - ``RFC 6125 ("Service Identity") implementation´´

exarkun Sat, 03 May 2014 05:22:18 -0700

On 1 May, 07:23 pm, gl...@twistedmatrix.com wrote:

On Apr 30, 2014, at 5:21 AM, exar...@twistedmatrix.com wrote:
I've just noticed that the changeset for #5190 included some untestedcode. Specifically, there are no tests for the code which detectsmissing dependencies and emits warnings about them.
My bad. Well, technically hawkowl's bad; hawkowl is a committer anddid the review and therefore has all the criminal liability in thiscase, but as the author who wrote the code I bear some responsibility,at least in some abstract, hypothetical sense ;-).

My hope is that by drawing attention to examples of this kind of mistakewill help us avoid making the mistake in the future. Considering whatmy email prompted you to write, I think it may work. :)

Thanks for working on the fix; it looks like the relevant ticket is<https://twistedmatrix.com/trac/ticket/7097>. I'll try to review thatas soon as it's ready; let me know.

No problem. I probably should have started my previous email withthanks to you and hawkowl for working on that feature. It is *really*good to have service identity checking support in Twisted.

I'd previously noticed that this code was broken but hadn't realizedthis was because it was untested.
Neither the author nor the reviewer realized this either, apparently.Certainly it wasn't an intentional omission.
I don't think there's any disagreement whatsoever over Twisted'stesting requirements. All code must have full line and branchcoverage (as reported by the coverage.py tool). Developers, pleasewrite tests for all of your code (and please learn to do test-drivendevelopment - it will make this task easier, I promise). Reviewers,please don't accept proposed changes that include untested code.
The problem with code like this is that, in some configurations, it isin fact reported as covered by coverage.py. It requires manualexamination to get the intersection of a diff and a coverage report,and even when you do, we still have too many places where it's "okay"to skip coverage.

This is true - but I'm not sure the code in this case is particularlyspecial. It's nearly always possible to write code and tests such thatcoverage.py says your code is covered but without actually having anymeaningful test coverage of the implementation. After all, coverage.pyonly knows that a line ran or didn't.

The problem of platform- or environment-specific code requiring multiplebranches which can never always all run is an extra challenge but Ithink a widely applicable solution is to not do that. To add to yourcomments below, if there is platform- or environment-specific code thenparameterize it on the environment and write tests for all of the cases.

As the author I looked at coverage periodically and it looked sort oflike what I expected. Since I was testing multiple installed-libraryconfigurations I had used "coverage combine" which misleadingly told methat it was all covered (although this particular code should have beentested independently without requiring a combined run). And I'm surethe reviewer thought about it a little bit, but even if they'd lookedat a coverage report, it might have looked like it was OK to skip theseparticular lines. And I was in fact doing test-driven development; Ididn't add the warning code there until I was looking at a failing testbecause one of the buildbots didn't have one of my expecteddependencies installed, and I made my tests pass locally by having anenvironment without those dependencies installed locally either.
Yes, I understand how this isn't really 100% TDD, and that a failure ona buildbot should have resulted in me writing a new test; mistakes weremade etc. But all TDD necessarily involves the occasionalerror/error/pass where there really ought to have been a pass/fail/pass- if we understood what was going on with all of our code all the timewe probably wouldn't need tests in the first place :-). It's a bitdisingenuous to say that I need to "learn to do test-drivendevelopment" to avoid mistakes like this, though.
On the other side of the equation, I imagine that a reviewer looking atthis, even carefully considering coverage, might see a missed line onsome buildbot or in their local run and then thought "oh, of course,but that line will be run if I had/didn't have that library installed".And there are some bits of code which are acceptable to cover in thismanner (except they should have direct test coverage from actual tests,rather than just importing the test module, which coverage.py won'tshow you). It's a quite subtle point to understand that thisparticular kind of code should actually be fully covered in allconfigurations. Especially because these tests are smack in the middleof a file which will be validly missing coverage in some supportedconfigurations (no pyOpenSSL installed) and surrounded with thickets ofconditionals and test skips to optionally import more dependencies thanjust this one.
We should remain vigilant, but I think that if we want to really reduceerrors like this in the future we need to make them easier to spot.Failing that we need to have more specific suggestions. In this case,I happen to know that I do TDD and that Hawkowl was is aware of thestandard on coverage issues (and is at least aware of coverage.py,whether or not it was run as part of this review), so those twosuggestions aren't going to help as we're already doing them. Any timethe solution to a problem is "everybody should just try harder" thatseems like a bet against human fallibility.
So until someone has a month to spend on an all-singing all-dancingcombined ratcheting coverage report for all the builders and afantastic visualization for its output which highlights every possiblecoverage issue, here are some specific suggestions which might avoidsome parts of this class of error:

I don't think we even have a plan for a tool that will report whether achange introduces code that isn't *really* tested (contrast "tested"with "executed").

I think this may be an area where we do actually need to rely on peopledoing a good job. Perhaps to counter balance that we need to eliminatemore of the other crap involved in the development process? Forexample, if reviewers never had to spend any time thinking about whetherthe whitespace in a change was correct, they would have that much morebrain power to apply to assessing the quality of the test suite.

For authors (what I could have done better):
I know I said they're inevitable, but whenever you get an error/pass,always consider where you could make it a clean fail/pass instead. You(and by "you" I obviously mean "me") think you understand why an errorhappened but the only way to really demonstrate you understand it wellenough to convert it into an assertion that fails with a useful errormessage.Be intensely suspicious of any code that needs to run at import time.I did stuff the warning into a function, which at least doesn't leaklocal variables, but I probably could have moved this warning somewhereeasier to manage, and would have noticed warnings coming out of testsas opposed to just being printed at the beginning. Declarative likedeprecatedModuleAttribute automate some of the magic for making code-level artifacts emit warnings when bound to and used rather thanaccidents of their initial import, so make use of those. (I'm stillthinking about how I could have applied that in this specific case; Iprobably could have.)Configure your development environment to be more aggressive aboutwarnings (at least for now, eventually trial should fix this for you,see <https://twistedmatrix.com/trac/ticket/6348>). I don't think itwould have helped in this particular case because the warning itself isemitted at import time (see point 2) but this sort of mistake crops upunfortunately frequently related to deprecation warnings, which are abit more common, and could often be caught by a better setup. Irecently changed my PYTHONWARNINGS environment variable to'all::DeprecationWarning,all::UserWarning', and that seems to catchmost things. (Unfortunately setting it to simply 'all' produces toomuch noise from the stdlib and dependencies so it's better to beslightly more restrictive.)
For reviewers (what hawkowl could have done better):
Run coverage. Particularly, run coverage just on the relevant andchanged test modules, and make sure the system under test gets rundirectly and just accidentally executed by running the code.I know I've been reminding reviewers lately to give clear feedbackabout what elements of reviews are suggestions and which are requiredfixes for violations of policy, and that may produce the subjectiveimpression that I've been asking for faster or less careful reviews.If so, I should correct that impression: I would like there to be lessbike shedding, but it's still pretty important that the £10 millionreactor actually work. Any lack of test coverage is at least apotential policy violation. Even if you think you understand why it'smissing, even if it looks like a platform variance that doesn't makesense to test on the machine you're running, always ask the author toexplain or justify why coverage isn't there, if it could be added to across-platform test with a reasonable (or, in many cases, even anexisting) fake; if there's no relevant fake and it would be too muchwork, maybe we need to file a ticket for implementing some testsupport.Especially if you're dealing with a new feature or a significantbehavior change, always try to actually run and interact with the codeand look at its output. In this case, noticing the whitespace /formatting errors in the warning messages might have lead us to spotthe coverage error earlier. (Jean-Paul made some comments to me whenhe noticed it, but it was an off-the-cuff thing after the branch hadalready been landed and not part of a code review; context is importanthere, as evidenced by the fact that it took him some time to realizethat it was indicative of a test coverage issue!

Thanks! These are great suggestions. Can we record them in a way thatlets all Twisted contributors learn from this case (instead of only thepeople reading this thread) - but without adding to the alreadyunreasonably large quantity of text new contributors are ostensiblyalready responsible for reading?


How's the unified Contributing-to-Twisted documentation effort coming?

Jean-Paul

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Re: [Twisted-Python] Some comments regarding #5190 - ``RFC 6125 ("Service Identity") implementation´´

Reply via email to