On Thu, Jun 5, 2014 at 3:05 PM, Kyle Mestery <mest...@noironetworks.com> wrote:
> On Thu, Jun 5, 2014 at 7:07 AM, Sean Dague <s...@dague.net> wrote: > > You may all have noticed things are really backed up in the gate right > > now, and you would be correct. (Top of gate is about 30 hrs, but if you > > do the math on ingress / egress rates the gate is probably really double > > that in transit time right now). > > > > We've hit another threshold where there are so many really small races > > in the gate that they are compounding to the point where fixing one is > > often failed by another one killing your job. This whole situation was > > exacerbated by the fact that while the transition from HP cloud 1.0 -> > > 1.1 was happening and we were under capacity, the check queue grew to > > 500 with lots of stuff being approved. > > > > That flush all hit the gate at once. But it also means that those jobs > > passed in a very specific timing situation, which is different on the > > new HP cloud nodes. And the normal statistical distribution of some jobs > > on RAX and some on HP that shake out different races didn't happen. > > > > At this point we could really use help getting focus on only recheck > > bugs. The current list of bugs is here: > > http://status.openstack.org/elastic-recheck/ > > > > Also our categorization rate is only 75% so there are probably at least > > 2 critical bugs we don't even know about yet hiding in the failures. > > Helping categorize here - > > http://status.openstack.org/elastic-recheck/data/uncategorized.html > > would be handy. > > > > We're coordinating changes via an etherpad here - > > https://etherpad.openstack.org/p/gatetriage-june2014 > > > > If you want to help, jumping in #openstack-infra would be the place to > go. > > > For the Neutron "ssh timeout" issue [1], we think we know why it's > spiked recently. This tempest change [2] may have made the situation > worse. We'd like to propose reverting that change with the review here > [3], at which point we can resubmit it and continue debugging this. > But this should help relieve the pressure caused by the recent surge > in this bug. > > Does this sound like a workable plan to get things moving again? > As we discussed on IRC yes, and thank you for hunting this one down. > > Thanks, > Kyle > > [1] https://bugs.launchpad.net/bugs/1323658 > [2] https://review.openstack.org/#/c/90427/ > [3] https://review.openstack.org/#/c/97245/ > > > -Sean > > > > -- > > Sean Dague > > http://dague.net > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev