Joe, Looks like we may be a bit more stable now?
Short URL: http://bit.ly/18qq4q2 Long URL : http://graphite.openstack.org/graphlot/?from=-120hour&until=-0hour&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-postgres-full'),'ED9121')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00F0F0')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron'),'00FF00')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.{S UCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00c868')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-grenade-dsvm.SUCCESS,sum(stats.zuul.pipeline.check.job.check-grenade-dsvm.{SUCCESS,FAILURE})),'6hours'),%20'check-grenade-dsvm'),'800080')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'E080FF') -- dims On Fri, Dec 6, 2013 at 11:28 AM, Matt Riedemann <mrie...@linux.vnet.ibm.com> wrote: > > > On Wednesday, December 04, 2013 7:22:23 AM, Joe Gordon wrote: >> >> TL;DR: Gate is failing 23% of the time due to bugs in nova, neutron >> and tempest. We need help fixing these bugs. >> >> >> Hi All, >> >> Before going any further we have a bug that is affecting gate and >> stable, so its getting top priority here. elastic-recheck currently >> doesn't track unit tests because we don't expect them to fail very >> often. Turns out that assessment was wrong, we now have a nova py27 >> unit test bug in gate and stable gate. >> >> https://bugs.launchpad.net/nova/+bug/1216851 >> Title: nova unit tests occasionally fail migration tests for mysql and >> postgres >> Hits >> FAILURE: 74 >> The failures appear multiple times for a single job, and some of those >> are due to bad patches in the check queue. But this is being seen in >> stable and trunk gate so something is definitely wrong. >> >> ======= >> >> >> Its time for another edition of of 'Top Gate Bugs.' I am sending this >> out now because in addition to our usual gate bugs a few new ones have >> cropped up recently, and as we saw a few weeks ago it doesn't take >> very many new bugs to wedge the gate. >> >> Currently the gate has a failure rate of at least 23%! [0] >> >> Note: this email was generated with >> http://status.openstack.org/elastic-recheck/ and >> 'elastic-recheck-success' [1] >> >> 1) https://bugs.launchpad.net/bugs/1253896 >> Title: test_minimum_basic_scenario fails with SSHException: Error >> reading SSH protocol banner >> Projects: neutron, nova, tempest >> Hits >> FAILURE: 324 >> This one has been around for several weeks now and although we have >> made some attempts at fixing this, we aren't any closer at resolving >> this then we were a few weeks ago. >> >> 2) https://bugs.launchpad.net/bugs/1251448 >> Title: BadRequest: Multiple possible networks found, use a Network ID >> to be more specific. >> Project: neutron >> Hits >> FAILURE: 141 >> >> 3) https://bugs.launchpad.net/bugs/1249065 >> Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py >> Project: nova >> Hits >> FAILURE: 112 >> This is a bug in nova's neutron code. >> >> 4) https://bugs.launchpad.net/bugs/1250168 >> Title: gate-tempest-devstack-vm-neutron-large-ops is failing >> Projects: neutron, nova >> Hits >> FAILURE: 94 >> This is an old bug that was fixed, but came back on December 3rd. So >> this is a recent regression. This may be an infra issue. >> >> 5) https://bugs.launchpad.net/bugs/1210483 >> Title: ServerAddressesTestXML.test_list_server_addresses FAIL >> Projects: neutron, nova >> Hits >> FAILURE: 73 >> This has had some attempts made at fixing it but its still around. >> >> >> In addition to the existing bugs, we have some new bugs on the rise: >> >> 1) https://bugs.launchpad.net/bugs/1257626 >> Title: Timeout while waiting on RPC response - topic: "network", RPC >> method: "allocate_for_instance" info: "<unknown>" >> Project: nova >> Hits >> FAILURE: 52 >> large-ops only bug. This has been around for at least two weeks, but >> we have seen this in higher numbers starting around December 3rd. This >> may be an infrastructure issue as the neutron-large-ops started >> failing more around the same time. >> >> 2) https://bugs.launchpad.net/bugs/1257641 >> Title: Quota exceeded for instances: Requested 1, but already used 10 >> of 10 instances >> Projects: nova, tempest >> Hits >> FAILURE: 41 >> Like the previous bug, this has been around for at least two weeks but >> appears to be on the rise. >> >> >> >> Raw Data: http://paste.openstack.org/show/54419/ >> >> >> best, >> Joe >> >> >> [0] failure rate = 1-(success rate gate-tempest-dsvm-neutron)*(success >> rate ...) * ... >> >> gate-tempest-dsvm-neutron = 0.00 >> gate-tempest-dsvm-neutron-large-ops = 11.11 >> gate-tempest-dsvm-full = 11.11 >> gate-tempest-dsvm-large-ops = 4.55 >> gate-tempest-dsvm-postgres-full = 10.00 >> gate-grenade-dsvm = 0.00 >> >> (I hope I got the math right here) >> >> [1] >> >> http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/elastic_recheck/cmd/check_success.py >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > Let's add bug 1257644 [1] to the list. I'm pretty sure this is due to some > recent code [2][3] in the nova libvirt driver that is automatically > disabling the host when the libvirt connection drops. > > Joe said there was a known issue with libvirt connection failures so this > could be duped against that, but I'm not sure where/what that one is - maybe > bug 1254872 [4]? > > Unless I just don't understand the code, there is some funny logic going on > in the libvirt driver when it's automatically disabling a host which I've > documented in bug 1257644. It would help to have some libvirt-minded people > helping to look at that, or the authors/approvers of those patches. > > Also, does anyone know if libvirt will pass a 'reason' string to the > _close_callback function? I was digging through the libvirt code this > morning but couldn't figure out where the callback is actually called and > with what parameters. The code in nova seemed to just be based on the patch > that danpb had in libvirt [5]. > > This bug is going to raise a bigger long-term question about the need for > having a new column in the Service table for indicating whether or not the > service was automatically disabled, as Phil Day points out in bug 1250049 > [6]. That way the ComputeFilter in the scheduler could handle that case a > bit differently, at least from a logging/serviceability standpoint, e.g. > info/warning level message vs debug. > > [1] https://bugs.launchpad.net/nova/+bug/1257644 > [2] https://review.openstack.org/#/c/52189/ > [3] https://review.openstack.org/#/c/56224/ > [4] https://bugs.launchpad.net/nova/+bug/1254872 > [5] http://www.redhat.com/archives/libvir-list/2012-July/msg01675.html > [6] https://bugs.launchpad.net/nova/+bug/1250049 > > -- > > Thanks, > > Matt Riedemann > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Davanum Srinivas :: http://davanum.wordpress.com _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev