Re: PSA: Flaky timeouts in mochitest-plain now fail newly added tests

Ehsan Akhgari Wed, 17 Dec 2014 18:34:02 -0800

On 2014-12-17 8:33 PM, Nils Ohlmeier wrote:

On Dec 12, 2014, at 10:34 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com> wrote:

We had a session on intermittent test failures in Portland <
https://etherpad.mozilla.org/ateam-pdx-intermittent-oranges>, and one of
the things that we discussed was adding analyses to our test suites that
detect known bad test writing practices <
https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges>
and fails the offending tests.  As part of that effort, I just landed the
patch to bug 649012 which enables this analysis on mochitest-plain.


Great idea.

What this means for you as a test author:
* Using setTimeout(foo, N); where N is not 0 will cause your tests to
fail.  You should double check why you're using such timeouts.  They are
almost never necessary.
* If you have a legitimate reason for using timeouts like this (such as if
your test needs to wait a bit to make sure an event doesn't happen in the
future


What about the other way around?
My test is waiting for an event to proceed with the test. All of our WebRTC
mochitests highly depend on events from the WebRTC API’s. Now the
obvious answer is: mochitest has a generic timeout of 300 seconds (I think)
which will terminate the test in case the expected event never fired. The
problem I have with that is that as said before the WebRTC tests depend
on whole sequence of events to happen. And if the generic mochitest timeout
kills a test it is very hard to tell which events exactly was missing, 
especially
as we are orchestrating two actors in the browser and both wait for the same
events.

Usually a good way of handling this problem is to log the steps in thetest via info() so that you can go back and look at the log to see howmuch of the test was executed. (I think these days you need to callSimpleTest.requestCompleteLog() to make that actually work.)

For that reason I got used to the habit of creating a timeout while the test is
waiting for an event to happen, because if that timer fires I can precisely
print an error message what is missing and why test is exiting early (besides
the fact that the test is not blocking for another 2xx seconds).

If the above is not an option, this is a legitimate use of setTimeout.You can describe this in the string argument to requestFlakyTimeout().

So while I totally agree with the description here
https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges
that it is bad to magically wait for time and hope that it is safe to proceed
with your test, I do not agree with the assessment that setTimeout in general
is evil.

I think the text in<https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges#Using_magical_timeouts_to_cause_delays>describes the issue with setTimeout well. Nobody is definitely sayingthat setTimeout in general is evil, but an automated analysis has no wayof knowing, which is why we have the current requestFlakyTimeout mechanism.

, and there is no other way of checking that), you should call
SimpleTest.requestFlakyTimeout("reason");.  The argument to the function is
a string which is meant to document why you need to use these kinds of
timeouts, and why that doesn't lead into intermittent test failures.


If we really want to have people not use setTimeout at all in test, but have a
legitimate use case for it:
- your test needs to wait for X time and check that either an event happened
or is missing and exit the test execution if the check failed
Can we then maybe add a connivence function for exactly that purpose?
I would prefer a connivence function, because that would allow us to still
catch the evil wait magically for a little bit as bad behavior. Where right now
I have no other choice then requesting to lift the whole protection via
requestFlakyTimeout().

Well, for most people the existing mochitest timeout mechanism plus thelog messages as the test proceeds is enough in that case. I understandthat adding the proper log messages after the fact can be a lot of work(although I don't know how big the WebRTC test suite that uses thispattern is.)

What I really do not like is the name chosen for this, as it seems to suggest
that timeouts in general are flaky, which they are not (see above).

Can we please rename this to something more appropriate like
   SimpleTest.requestUnreliableTimeout(“reason”);
Because I think with “flaky” you mean unreliable.

I really don't see either name as clearly better than the other, andthat is a lot of sed'ing (around 580 occurrences of that phrase), but ifyou feel strongly enough to write a patch, I'd be happy to review. :-)

And while we are on improving this: please, please, please improve the error
message. Right now someone who writes a new test he will see something
like this in his try logs:
  "Test attempted to use a flaky timeout value 5000"
And he will be baffled like me and some colleagues this morning about the
error message and what it means. Would it have been acceptable if I
would have used 4999 instead of 5000? In case of the WebRTC tests you can
very easily write a legit test case without ever calling setTimeout yourself,
but some of the generic helper code will do it for the test writer. And then
the current error message essentially means nothing to him.

Yes, we can and should definitely improve the error message. Since youhit this in practice, what error message would have helped youunderstand the issue better? Perhaps we should at least link to<https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges>?


Thanks for the feedback!
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: PSA: Flaky timeouts in mochitest-plain now fail newly added tests

Reply via email to