On 29/09/12 04:14 PM, Justin Lebar wrote:
One proposal that's been made elsewhere 
(https://bugzilla.mozilla.org/show_bug.cgi?id=791385) is to have a soft limit 
of one active push per developer on try. If you try and push a 2nd time before
your previous jobs are all finished, you will be asked to cancel your previous 
jobs. There would be some kind of manual override that would allow you to push 
additional patches.

I think this would likely be much less impactful than than bholley's
proposed -p any, since in the common circumstance where I push to try,
notice it's going permaorange on all platforms, and then want to
cancel all remaining builds/tests, I've already wasted a lot of
resources which would have been saved by -p any.

That's not to say it's not an interesting idea; I just hope it gets
prioritized appropriately.

Also, I hope this manual override is not a pain to use.  Pretty please?  :)

The hook attached to the bug requires that you include a short string token in your commit message. The token is generated as a function of time, your ldap name, and a local secret. Without specifying the token the hook will reject your 2nd push, remind you that you can cancel your previous jobs, and give you the token as well as the time at which the token expires.

Surface [the leaderboard of try abusers] on tbpl, clearly visible on the 
inbound pushes. Public shaming ftw.

The intent here is definitely not public shaming. More like public awareness. I'm in no position to judge if all those pushes are using try effectively.

If we're going to hold anyone publicly accountable, I think it should
be the teams which are responsible for ensuring we have enough
resources to run builds and tests.

We're all trying to build the best system we can here. We've been publishing as much raw data as we can, as well as reports like wait time data for ages. We're not trying to hide this stuff away. At the same time, it's impossible to give any kind of SLA when the build/test load is unbounded.

We should have a public dashboard showing end-to-end tryserver times
-- starting with a push, how long did it take for all the requested
tests to complete?  And we should surface not only the mean, but
quantiles -- that is, how long were wait times for the 90th percentile
of longest wait times?

I can take another stab at this. However, I'm not sure that try is the best branch to do this on, since the best-case end-to-end time varies drastically depending on which platforms/tests were selected, and if the user opted to cancel or rebuild jobs later.

I understand an intern worked on an approximation of this, but didn't
entirely get there, so his tool hasn't been publicly released.

If the expectation is that developers should be accountable for the
resources they use, I think it's only fair that releng/it be
accountable for the resources they provide.

We've seen that where we don't have tracking -- e.g. for how long it
takes to push to try [1], or basically for anything else at Mozilla --
we often regress the metric we're interested in.  You make what you
measure.  If we want consistently fast try pushes, it's hard to
imagine how we'd get there without public data monitoring exactly the
thing we're interested in.

I'm sure IT would love some help in figuring out how to measure this.

Cheers,
Chris

-Justin

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=691459


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to