On 2014-04-30 16:52 , Daniel Holbert wrote: > On 03/07/2014 02:41 PM, Hal Wine wrote: >> On 2014-02-28 17:24 , Hal Wine wrote: >>> tl;dr: what is the balance point between pushes to try taking too long >>> and loosing repository history of recent try pushes? >> Based on the responses to this specific question, we'll go back to >> waiting for developers to notify IT when there is enough performance >> impact to warrant a reset of the try repository
Thanks for reopening this thread. > > As documented on > https://bugzilla.mozilla.org/show_bug.cgi?id=994028 > we've now had multiple instances in the past few weeks where Try has > been horked (refusing all pushes) for hours at a time, with no clear > reason why. > > I'm not sure if this is caused by Try having too many heads & needing a > reset, but it seems like it could be. (It also could be *indirectly* > caused by the too-many-heads issue, too; e.g. perhaps someone > interrupted a push because it was taking too long (due to too many > heads), and their client inadvertently left something on the server > locked, which then locks everyone else out for hours.) > > Whatever the cause, it's feeling more and more like periodic, automatic > Try resets would be helpful to keep things running smoothly. Yes, or a better working definition of "too much performance impact". In this case, we had a 4h10m gap in the pushlog, and now things are back to "normal". A reset would take about that long to perform. > > Would it be possible to set up a system along the lines of dbaron's > suggestion earlier in this post? (Frequent resets, with a post-reset > step to pull in the most recent ~2 weeks worth of heads from the old > repo, so that people's try pushes don't mysteriously disappear if they > happen to push right before a reset.) It is something that could be tried - we'll try a few dry runs to see how much this adds to the reset try duration (given that we have to pull those changes from the "slow repo"). I also have some fresh thoughts on https://bugzil.la/691459 - there may be some log correlation possible to get us hard data on overall success rates and push times. Looking forward to getting a newer (and hopefully better) approach to this recurring issue. --Hal _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform