Re: Project Stockwell (reducing intermittents) - March 2017 update

Marco Bonardo Wed, 08 Mar 2017 05:56:58 -0800

On Wed, Mar 8, 2017 at 2:12 PM, Kartikaya Gupta <kgu...@mozilla.com> wrote:


> What makes me sad is all the
> developers in this thread trying to push back against disabling of
> clearly problematic tests, asking for things like tracking bugs and
> needinfos, when the reality is that the developers just don't have the
> time to deal with new bugs or needinfos.


I sort of disagree with the sentiment. We're all well aware of the great
job sheriffs and people like Joel are doing. I also did sheriffing for
quite some time in spare time in the past, even out of work times (on
saturdays and sundays) and I partially know what I'm talking about. I'd
thank them every minute.


> I think we have a structural problem where developers are insulated
> from the actual fallout of bad tests.


That was one of the reasons to introduce dedicated sheriffs and fast
backouts indeed, so developers can spend less time looking at the tree and
more coding.


> But I feel fundamentally the problem is that
> developers have no real incentive (except for "pride") to fix
> intermittents. Even disabling tests is not really an incentive, as
> there is no negative effect to the developer when it happens.


I feel quite bad if one of the tests in the modules I own gets disabled,
that's has quite a negative effect on me. But I agree there is no incentive
on fixing intermittents, or better, there is no dedicated resources for
that (more later).


> The
> naive solution to aligning incentives is to make more developers
> responsible for starring failures.


We know this didn't work already.


> Another potential solution is block developers until they fix their
> tests.


The first problem I see is to figure out who "owns" the failing test. I
have tests in the modules I own that have some intermittent failures (not
so frequent luckily), but those modules have no team to fix those. Should I
be blocked until I fix all the intermittent tests in my modules? Who should
be? How do I find developers with time to help me fix those failures? I
have no idea.
Also, sounds like an expensive trade-off to suggest to management, if that
person is working on a critical project for the quarter, blocking him to
work on an intermittent that may be tricky to solve and take days, will
have a huge cost.

Btw, my opinion here is that the situation will never improve until there's
a general recognition that intermittents have a cost and thus teams should
dedicate part of their time to them. So far the burden is put on
individuals, that try to volunteer time between planned projects to fix
tests they know something about. But no team, afaik, has a dedicated
planning for these issues.
Why is triaging and prioritizing intermittent failueres fixes not
officially part of every team's weekly planning?
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Project Stockwell (reducing intermittents) - March 2017 update

Reply via email to