On Wed, Aug 29, 2012 at 8:00 PM, Matt Brubeck <mbrub...@mozilla.com> wrote:
> On 08/29/2012 04:03 PM, Ehsan Akhgari wrote: > >> I don't believe that the current situation is acceptable, especially >> with the recent focus on performance (through the Snappy project), and I >> would like to ask people if they have any ideas on what we can do to fix >> this. The fix might be turning off some Talos tests if they're really >> not useful, asking someone or a group of people to go over these test >> results, get better tools with them, etc. But _something_ needs to >> happen here. >> > > Thanks for starting this discussion. I have some suggestions: > > * Less is more. We can pay more attention to tests if every alert is for > something we care about. We can *track* stuff like Trace Malloc Allocs if > there are people who find the data useful in their work, but we should not > *alert* on it unless it is a key user-facing metric. > Yes. I think one major problem is that we have some Talos tests which measure useless things, and people tend to think that Talos measurements in general are useless because of that bad reputation. And it also increases the cognitive load of dealing with the problem, as you state. > * I don't like our reactive approach that focuses on trying to identify > regressions, and then decide whether to fix them in place, back them out, > or ignore them. Instead we should proactively set goals for what our > performance should be, and focus on the best way to get it there (or keep > it there). The goals should be based the desired user experience, and we > should focus only on metrics that reflect those user experience goals. > +1. > * Engineering teams should have to answer for these metrics; for example > they should be included in quarterly goals. At Amazon, item #1 in the > quarterly goals for each team was always to meet our metrics commitments. > Slipping a key metric past a certain threshold should stop other work for > the team until it's rectified. > We should definitely do that if we want care about performance as an important measure of quality (which is my impression. :-) > * We need staff whose job includes deciding which regressions are > meaningful, identifying the cause, following up to make sure it's backed > out or fixed, and refining the process and tools used to make all this > possible. Too much slips through the cracks when we leave this to > volunteers (including "employeeteers" like Ehsan or me). This, a thousand times! -- Ehsan <http://ehsanakhgari.org/> _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform