Re: Wish list for tools to help fix intermittent bugs

Jonathan Griffin Tue, 09 Dec 2014 16:37:33 -0800

Thanks Andrew.

Gijs, if you'd like to see the notes we took in PDX on this topic,they're here: https://etherpad.mozilla.org/ateam-pdx-intermittent-oranges

Feel free to add more ideas and comments. We're currently working onour Q1 plan and will see how many of these things we can fit in then.


Jonathan

On 12/9/2014 6:24 AM, Andrew Halberstadt wrote:

We had a session on intermittents in PDX. Additionally we (the ateam)have had several brainstorming sessions prior to the work week. I'lltry to summarize what we talked about and answer your questions at thesame time in-line.
On 08/12/14 03:52 PM, Gijs Kruitbosch wrote:
1) make it easier to figure out from bugzilla/treeherder when and where
the failure first occurred
- I don't want to know the first thing that got reported to bmo - IME,
that is not always the first time it happened, just the first time it
got filed.

In other words, can I query treeherder in some way (we have structured
logs now right, and all this stuff is in a DB somewhere?) with a test
name and a regex, to have it tell me where the test first failed with a
message matching that regex?
Structured logs have been around for a few months now, but onlyrecently has mozharness started using them for determining failurestatus (and even now only for a few suites).
The next step is absolutely storing this stuff into a DB. Starting nowand into Q1 we'll be creating a prototype to figure out things likeschemas, costs and logistics. Unlike logs, we want to keep this dataforever, so we need to make sure we get it right.
As part of the prototype phase, we plan to answer some simplequestions that don't require lots of historical data. Can we identifynew flaky tests? Can we normalize chunks based on runtime instead ofnumber of tests?
2) make it easier to figure out from bugzilla/treeherder when and where
the failure happens

3) numbers on how frequently a test fails
I think these both tie into number 1. We aren't sure exactly what theschema will look like, but tying metadata about the test run into theresults is obviously something we need to do. These questions wouldbecome easy to answer.
We also want to look into cross correlating data from other systems(e.g bugzilla, orangefactor, ...) into test results. This will likelybe further out though.
4) automate regression hunting (aka mozregression for intermittent
infra-only failures)
Yes, this is explicitly one of the first things we'll be tackling.Often sheriffs don't have time to go and retrigger backfills, theyshouldn't have to. This sort of but not really depends on the DBproject outlined above.
5) rr or similar recording of failing test runs

We've talked about this before on this newsgroup, but it's been a long
time. Is this feasible and/or currently in the pipeline?
We're aware of rr, but it's not something that has been called out assomething we should do in the short term. My understanding is thatthere are still a lot of unknowns, and getting something stood up inproduction infrastructure will likely be a large multi-quarterproject. Maybe :roc can clarify here.
I'm not saying we won't do it, it would be awesome, but it seems likethere are easier wins we can make in the meantime.
~ Gijs
Other things that we talked about that might make dealing withintermittents better:
* dynamic (maybe also static) analysis of new tests to determinecommon bad patterns (ehsan has ideas) to be integrated into autolandor a post-commit hook or some kind of quarantine.
* in-tree chunking/more dynamic test scheduling (ability to scheduleonly certain tests). One of the end goals here is for the term"chunking" to disappear from the point of view of developers.
* c++ code coverage tied into the build system with automaticallyupdated reports (I'm working on the build integration pieces on theside).
* automatic filing of intermittents (this is currently what thesheriffs spend the most time on, fixing this frees them up to bettermonitor the tree).
Thanks for caring about the state of intermittents, they've beenneglected for too long. I'm hopeful that 2015 will bring manyimprovements in this area. And of course, please let us know if youhave any other ideas or would like to help out.
-Andrew
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Wish list for tools to help fix intermittent bugs

Reply via email to