On 12-08-30 5:28 PM, Dave Mandelin wrote:
On Thursday, August 30, 2012 9:11:25 AM UTC-7, Ehsan Akhgari wrote:
On 12-08-29 9:20 PM, Dave Mandelin wrote:
On Wednesday, August 29, 2012 4:03:24 PM UTC-7, Ehsan Akhgari wrote:
In my opinion, one of the reasons why Talos is disliked is because many
people don't know where its code lives (hint:
http://hg.mozilla.org/build/talos/) and can't run those tests like other
test suites. I think this would be very valuable to fix, so that
developers can read Talos tests like any other test, and fix or improve
them where needed.
It is hard to find. And beyond that, it seems hard to use. It's been a while since I've
run Talos locally, but last time I did it was a pain to set up and difficult to run, and
I hear it's still kind of like that. For testing tools, "convenient for the
developer" is a critical requirement, but has been neglected in the past.
js/src/jit-test/ is an example of something that is very convenient for
developers: creating a test is just adding a .js file to a directory (no
manifest or extra files; by default error or crash is a fail, but you can
change that for a test), the harness is a Python file with nice options, the
test configuration and basic usage is documented in a README, and it lives in
the tree.
Absolutely! We really need to work hard to make them easier to run. I
hear that the Automation team has already been making progress towards
that goal.
[...] I believe
that the bigger problem is that nobody owns watching over these numbers,
and as a result as take regressions in some benchmarks which can
actually be representative of what our users experience.
The interesting thing is that we basically have no idea if that's true for any
given Talos alarm.
That's something that I think should be judged per benchmark. For
example, the Ts measurements will probably correspond very directly to
the startup time that our users experience. The Tp5 measurements don't
directly correspond to anything like that, since nobody loads those
pages sequentially, but it could be an indication of average page load
performance.
I exaggerated a bit--yes, some tests like Ts are pretty easy to understand and
do correspond to user experience. With Tp5, I just don't know--I haven't spent
any time trying to use it or looking at regressions, since JS doesn't affect it.
Right. I think at the very least, on bigger tests like Tp5 we want to
know if something is regressed by a large amount, because that is very
likely to reflect an actual behavior change which is worth knowing about.
- Speaking of false positives, we should seriously start tracking them. We
should keep track of each Talos regression found and its outcome. (It would be
great to track false negatives too but it's a lot harder to catch them and
record them accurately.) That way we'd actually know whether we have a few
false positives or a lot, or whether the false positives were coming up on
certain tests. And we could use that information to improve the false positive
rate over time.
I agree. Do you have any suggestions on how we would track them?
The details would vary according to the preferences of the person doing it, but I'd
sketch it out something like this: when Talos detects a regression, file a bug to
"resolve" it (i.e., show that it's not a real regression, show that it's an
acceptable regression for the patch, or fix the regression). Then keep a file listing
those bugs (with metadata for each: tests regressed, date, component, etc), and as each
is closed, mark down the result: false positive, allowed, backed out, or fixed. That's
your data set. Of course, various parts of this could be automated but that's not
required.
Oh, sorry, I needed to ask my question better. I'm specifically
wondering who needs to track and investigate the regression if it
happened on a range of, let's say, 5 committers...
Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform