Re: The current state of Talos benchmarks

Ehsan Akhgari Thu, 30 Aug 2012 14:54:59 -0700

On 12-08-30 5:28 PM, Dave Mandelin wrote:

On Thursday, August 30, 2012 9:11:25 AM UTC-7, Ehsan Akhgari wrote:

On 12-08-29 9:20 PM, Dave Mandelin wrote:

On Wednesday, August 29, 2012 4:03:24 PM UTC-7, Ehsan Akhgari wrote:


In my opinion, one of the reasons why Talos is disliked is because many
people don't know where its code lives (hint:
http://hg.mozilla.org/build/talos/) and can't run those tests like other
test suites.  I think this would be very valuable to fix, so that
developers can read Talos tests like any other test, and fix or improve
them where needed.


It is hard to find. And beyond that, it seems hard to use. It's been a while since I've 
run Talos locally, but last time I did it was a pain to set up and difficult to run, and 
I hear it's still kind of like that. For testing tools, "convenient for the 
developer" is a critical requirement, but has been neglected in the past.

js/src/jit-test/ is an example of something that is very convenient for 
developers: creating a test is just adding a .js file to a directory (no 
manifest or extra files; by default error or crash is a fail, but you can 
change that for a test), the harness is a Python file with nice options, the 
test configuration and basic usage is documented in a README, and it lives in 
the tree.

Absolutely! We really need to work hard to make them easier to run. Ihear that the Automation team has already been making progress towardsthat goal.

[...] I believe
that the bigger problem is that nobody owns watching over these numbers,
and as a result as take regressions in some benchmarks which can
actually be representative of what our users experience.

The interesting thing is that we basically have no idea if that's true for any 
given Talos alarm.


That's something that I think should be judged per benchmark.  For
example, the Ts measurements will probably correspond very directly to
the startup time that our users experience.  The Tp5 measurements don't
directly correspond to anything like that, since nobody loads those
pages sequentially, but it could be an indication of average page load
performance.


I exaggerated a bit--yes, some tests like Ts are pretty easy to understand and 
do correspond to user experience. With Tp5, I just don't know--I haven't spent 
any time trying to use it or looking at regressions, since JS doesn't affect it.

Right. I think at the very least, on bigger tests like Tp5 we want toknow if something is regressed by a large amount, because that is verylikely to reflect an actual behavior change which is worth knowing about.

- Speaking of false positives, we should seriously start tracking them. We 
should keep track of each Talos regression found and its outcome. (It would be 
great to track false negatives too but it's a lot harder to catch them and 
record them accurately.) That way we'd actually know whether we have a few 
false positives or a lot, or whether the false positives were coming up on 
certain tests. And we could use that information to improve the false positive 
rate over time.


I agree.  Do you have any suggestions on how we would track them?


The details would vary according to the preferences of the person doing it, but I'd 
sketch it out something like this: when Talos detects a regression, file a bug to 
"resolve" it (i.e., show that it's not a real regression, show that it's an 
acceptable regression for the patch, or fix the regression). Then keep a file listing 
those bugs (with metadata for each: tests regressed, date, component, etc), and as each 
is closed, mark down the result: false positive, allowed, backed out, or fixed. That's 
your data set. Of course, various parts of this could be automated but that's not 
required.

Oh, sorry, I needed to ask my question better. I'm specificallywondering who needs to track and investigate the regression if ithappened on a range of, let's say, 5 committers...



Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: The current state of Talos benchmarks

Reply via email to