Re: [openstack-dev] State of the Gate - Dec 12

Matt Riedemann Thu, 12 Dec 2013 10:26:18 -0800


On 12/12/2013 7:20 AM, Sean Dague wrote:

Current Gate Length: 12hrs*, 41 deep

(top of gate entered 12hrs ago)

It's been an *exciting* week this week. For people not paying attention
we had 2 external events which made things terrible earlier in the week.

==========================
Event 1: sphinx 1.2 complete breakage - MOSTLY RESOLVED
==========================

It turns out sphinx 1.2 + distutils (which pbr magic call through) means
total sadness. The fix for this was a requirements pin to sphinx < 1.2,
and until a project has taken that they will fail in the gate.

It also turns out that tox installs pre-released software by default (a
terrible default behavior), so you also need a tox.ini change like this
- https://github.com/openstack/nova/blob/master/tox.ini#L9 otherwise
local users will install things like sphinx 1.2b3. They will also break
in other ways.

Not all projects have merged this. If you are a project that hasn't,
please don't send any other jobs to the gate until you do. A lot of
delay was added to the gate yesterday by Glance patches being pushed to
the gate before their doc jobs were done.

==========================
Event 2: apt.puppetlabs.com outage - RESOLVED
==========================

We use that apt repository to setup the devstack nodes in nodepool with
puppet. We were triggering an issue with grenade where it's apt-get
calls were failing, because it does apt-get update once to make sure
life is good. This only triggered in grenade (noth other devstack runs)
because we do set -o errexit aggressively.

A fix in grenade to ignore these errors was merged yesterday afternoon
(the purple line - http://status.openstack.org/elastic-recheck/ you can
see where it showed up).

==========================
Top Gate Bugs
==========================

We normally do this as a list, and you can see the whole list here -
http://status.openstack.org/elastic-recheck/ (now sorted by number of
FAILURES in the last 2 weeks)

That being said, our bigs race bug is currently this one bug -
https://bugs.launchpad.net/tempest/+bug/1253896 - and if you want to
merge patches, fixing that one bug will be huge.

Basically, you can't ssh into guests that get created. That's sort of a
fundamental property of a cloud. It shows up more frequently on neutron
jobs, possibly due to actually testing the metadata server path. There
have been many attempts on retry logic on this, we actually retry for
196 seconds to get in and only fail once we can't get in, so waiting
isn't helping. It doesn't seem like the env is under that much load.

Until we resolve this, life will not be good in landing patches.

        -Sean



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

There have been a few threads [1][2] on gate failures and the processaround what happens when we go about identifying, tracking and fixing them.

I couldn't find anything outside of the mailing list to keep a record ofthis so started a page here [3].

Feel free to contribute so we can point people to how they can easilyhelp in working these faster.

[1]http://lists.openstack.org/pipermail/openstack-dev/2013-November/020280.html[2]http://lists.openstack.org/pipermail/openstack-dev/2013-November/019931.html

[3] https://wiki.openstack.org/wiki/ElasticRecheck

--

Thanks,

Matt Riedemann


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] State of the Gate - Dec 12

Reply via email to