On 2014-12-19 4:40 PM, Nils Ohlmeier wrote:
On Dec 19, 2014, at 6:56 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com> wrote:
Let me try to rephrase the problem in different terms, to hopefully make it
clearer why using timers like this is a bad idea.
setTimeout(foo, 1000) may seem to suggest run foo after 1 second, but that is
*not* what that function does, at all. What it does is, run foo after 1
second, or perhaps at any other point in time later than that, and it could
even be *minutes* later.
A lot of people are under the assumption that just because their local timings
pan out, and they've added a, let's say, 10x error margin to their timers,
their use of timers is fine. It's not, as has been demonstrated as
intermittent oranges over the years. We have no way of knowing whether those
calculations hold under various types of machine loads, on new platforms, on
much lower powered machines (think of running the tests where the such
measurements have been done on a beefy desktop on a low-end phone.) And
increasing the error margin won't save us, it will just move the problem
further into the future.
Ok. That’s fine by my.
First of all I assume that the slowness of the machine also effects mochitest’s
over all timer, right?
That's correct.
> If not, then we are in big trouble.
Secondly remember that I’m NOT doing anything useful when my
“missing-event-timer” pops. I’m just logging a message that the event is
missing and then I’m trying to exit the test early (earlier then the mochitests
general test timeout).
Your approach is _still_ faulty in that you are assuming that section of
your test will reasonably finish in 60s, but if the test slave is
starved, the 60s timer may fire *before* the event you are waiting for
has a chance to be dispatched. Based on that, your approach is prone to
intermittent failures.
Of course, the exact level of problem depends on what you mean by "log"
above. If you mean something that is *actually* side-effect free such
as dump() or info(), then the worst thing that can happen is that you
may see misleading log messages sometimes. But if you do *anything*
which can have side-effects, you should stop using setTimeout ASAP.
That includes things such as ok(false, "the event didn't arrive"),
SimpleTest.finish(), etc.
At this point though, you should be asking yourself, why go through the
pain of ensuring those timer callbacks are side-effect-free and that
they remain so for the years to come? Really, the problem that you are
trying to protect yourself against doesn't exist. So, please just don't
use setTimeouts like that, ever! :-) That will keep things much
simpler, and less footgun-ish.
The only problem I can see with that approach is: if the “missing-event-timer”
actually pops more or less precisely after the requested time, BUT the actual
work is running super slow motion for some reason. And so the event would have
arrived, but after the even-timeout.
So if my “missing-event-timers” are popping on time, because they are for some
bizarre reason not affected by the over all slowness of the machine then we
have a problem with this. But in that case we should have a problem with the
over test timeout of mochitest as well, or?
The default mochitest timeout is 5 minutes. And you're right, if your
test takes longer than that, it will fail. If it does so
intermittently, then it will fail intermittently. And if that happens,
the solution is to either use SimpleTest.requestLongerTimeout() to
request a longer timeout, or to split up the test. Years of experience
has shown that the default 5 minute timeout is sufficient for all of our
platforms in practice. And in the future if that proves to be too
short, we can change it globally with a one line change (in
TestRunner.js). If individual test authors roll out their own clones of
the harness timeout mechanism, updating them in that situation will
become very very hard, if possible at all.
Also, please note that letting the test time out if it doesn't succeed is a
perfectly valid failure scenario, we have numerous tests that do things like
fire an async event, setup a listener and wait for the listener to fire and
SimpleTest.finish() there, without any timers to catch the case where the event
does not fire.
My test are not about testing this one timer. Doing that would be relatively
easy.
In WebRTC I need to orchestrate two participants to establish a real time audio
video call. Lots of the tests need to establish a full call, before assessing
if the call got properly established under the given criteria. The most common
scenario for intermittent failures is that the call does not even gets
established. From start to a full call I need to verify several state machines
and lots of events firing and all of that twice for the two participants of
each call.
So yes I could simply try to establish a call (and not worrying about all the
intermediary steps and events) and then do the verification/assessment. And let
call failures be caught by the generic mochitest timeout. But then I would
spend hours and hours staring at pretty useless log files (see below) every
time a call got stuck.
Can you please point me to the code in question?
Logging sufficiently is almost always enough to not have to use these timers,
as those tests have demonstrated in practice.
I disagree. Especially as the logging of mochitest is really poor.
- the timestamps of log messages are lost as described in bug 1020516
- SimpleTest only has one log level: info(). Which I think is the main reason
the log files get big, because test writers can’t fine tune the logging.
- Log messages from mochitest are not printed together with the stdout/stderr
of the browser (so correlating these two is almost impossible)
- Logging behaves differently on your local machine then on the build servers
Everybody (yours truly included!) has their favorite list of logging
issues, and honestly I've not been paying close attentions to the recent
changes such as structured logs, so I'm not able to help you on any of
these specifically. Some of these issues are possible to work around in
your own code (for example you can have your own logging levels and
whatnot, based on a quick grep there's even some prior art, for example
see Logger in accessible/jsat/Utils.jsm). Some of these (such as the
latter two) seem to be issues that you'd also have with your own custom
logging in mochitest, so I'm not sure how the timer based logging is not
affected by them.
BTW I understand that WebRTC has pretty special and probably unique
requirements, compared to what the rest of the mochitest users do with it.
That’s why I was trying to make the case for adding a special connivence
function for setting up these special “missing-event-timer”, which BTW would
also allow us to increase or decrease the timeout values based on the
platform/machine the test currently gets executed on.
If there were a way to implement your idea of "missing event timer" in a
way that is not prone to intermittent failures, I'd say go for it. But
I don't understand how you can do that.
But thinking of this more, do you find it useful to be able to register
a callback with the harness to be called when the harness decides to
kill your test because it has timed out? (Obviously the callback would
be informational, and would not be able to prevent the harness from
killing the test and moving on.)
But as I don’t seem to be able to make my case here I’ll simply live with
requesting flaky timeouts for all WebRTC tests.
Well, that would be a terrible conclusion to this thread. :( That
would mean that WebRTC tests will be prone to all sorts of intermittent
test failures caused by setTimeout (both the scenario above, and others
-- as reuestFlakyTimeout is a blanket pass) forever.
Please note that intermittent test failures are a real problem. Besides
all of the time wasted by sheriffs/developers/machines to deal with
them, they reduce our ability to tell whether a given checkin is "good",
and *most importantly* they will result in the tests getting disabled.
Tests getting disabled means that areas of our code will be untested, so
we won't be able to detect the regressions that the tests would have
otherwise caught, and we'd end up shipping those bugs to our users.
I'm really trying hard to suggest something that is not flaky for your
needs. Perhaps the above idea would help. If not, maybe I can come up
with something by looking at the code. But simply calling
requestFlakyTimeout in *all* WebRTC tests and moving on is not really an
outcome that I look forward to seeing.
Cheers,
Ehsan
Cheers,
Ehsan
On 2014-12-19 1:07 AM, Boris Zbarsky wrote:
On 12/18/14, 2:28 PM, Nils Ohlmeier wrote:
Well there is an event listener waiting for the event to fire.
But how else then through a timeout, obviously with a high timeout
value like
30 or 60 seconds
We've had quite a number of random oranges from precisely this scenario.
It seems that it's not uncommon for the test VMs to completely lose
the timeslice for 60 seconds or more.
If the test is in fact expecting the event to fire, the right thing to
do is to jut have the test time out per the harness timeout (which can
be globally adjusted as needed depending on the characteristics of the
test environment) if the event doesn't fire. Rolling your own shorter
timer just means contributing to random orange.
Sure I can print an info message that my test is now waiting for an
event to pop,
Sure, and another one when the event comes in. Then you can check the
log to see whether the timeout was for this reason in the event of a
harness timeout on this test.
But as I tried to describe in my other
email having a long living timer which pops complaining that event X
is missing
I think is a legit use case for setTimeout in test.
Given the number of random oranges we've had from _precisely_ this use
of setTimeout, I don't think it's legit.
-Boris
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform