I had the labels wrong - here's a slightly better link - http://bit.ly/1gdxYeg
On Fri, Dec 6, 2013 at 4:31 PM, Davanum Srinivas <dava...@gmail.com> wrote: > Joe, > > Looks like we may be a bit more stable now? > > Short URL: http://bit.ly/18qq4q2 > > Long URL : > http://graphite.openstack.org/graphlot/?from=-120hour&until=-0hour&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-postgres-full'),'ED9121')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-postgres-full.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00F0F0')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron'),'00FF00')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-large-ops. {SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'00c868')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-grenade-dsvm.SUCCESS,sum(stats.zuul.pipeline.check.job.check-grenade-dsvm.{SUCCESS,FAILURE})),'6hours'),%20'check-grenade-dsvm'),'800080')&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.SUCCESS,sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-large-ops.{SUCCESS,FAILURE})),'6hours'),%20'gate-tempest-dsvm-neutron-large-ops'),'E080FF') > > -- dims > > > On Fri, Dec 6, 2013 at 11:28 AM, Matt Riedemann > <mrie...@linux.vnet.ibm.com> wrote: >> >> >> On Wednesday, December 04, 2013 7:22:23 AM, Joe Gordon wrote: >>> >>> TL;DR: Gate is failing 23% of the time due to bugs in nova, neutron >>> and tempest. We need help fixing these bugs. >>> >>> >>> Hi All, >>> >>> Before going any further we have a bug that is affecting gate and >>> stable, so its getting top priority here. elastic-recheck currently >>> doesn't track unit tests because we don't expect them to fail very >>> often. Turns out that assessment was wrong, we now have a nova py27 >>> unit test bug in gate and stable gate. >>> >>> https://bugs.launchpad.net/nova/+bug/1216851 >>> Title: nova unit tests occasionally fail migration tests for mysql and >>> postgres >>> Hits >>> FAILURE: 74 >>> The failures appear multiple times for a single job, and some of those >>> are due to bad patches in the check queue. But this is being seen in >>> stable and trunk gate so something is definitely wrong. >>> >>> ======= >>> >>> >>> Its time for another edition of of 'Top Gate Bugs.' I am sending this >>> out now because in addition to our usual gate bugs a few new ones have >>> cropped up recently, and as we saw a few weeks ago it doesn't take >>> very many new bugs to wedge the gate. >>> >>> Currently the gate has a failure rate of at least 23%! [0] >>> >>> Note: this email was generated with >>> http://status.openstack.org/elastic-recheck/ and >>> 'elastic-recheck-success' [1] >>> >>> 1) https://bugs.launchpad.net/bugs/1253896 >>> Title: test_minimum_basic_scenario fails with SSHException: Error >>> reading SSH protocol banner >>> Projects: neutron, nova, tempest >>> Hits >>> FAILURE: 324 >>> This one has been around for several weeks now and although we have >>> made some attempts at fixing this, we aren't any closer at resolving >>> this then we were a few weeks ago. >>> >>> 2) https://bugs.launchpad.net/bugs/1251448 >>> Title: BadRequest: Multiple possible networks found, use a Network ID >>> to be more specific. >>> Project: neutron >>> Hits >>> FAILURE: 141 >>> >>> 3) https://bugs.launchpad.net/bugs/1249065 >>> Title: Tempest failure: tempest/scenario/test_snapshot_pattern.py >>> Project: nova >>> Hits >>> FAILURE: 112 >>> This is a bug in nova's neutron code. >>> >>> 4) https://bugs.launchpad.net/bugs/1250168 >>> Title: gate-tempest-devstack-vm-neutron-large-ops is failing >>> Projects: neutron, nova >>> Hits >>> FAILURE: 94 >>> This is an old bug that was fixed, but came back on December 3rd. So >>> this is a recent regression. This may be an infra issue. >>> >>> 5) https://bugs.launchpad.net/bugs/1210483 >>> Title: ServerAddressesTestXML.test_list_server_addresses FAIL >>> Projects: neutron, nova >>> Hits >>> FAILURE: 73 >>> This has had some attempts made at fixing it but its still around. >>> >>> >>> In addition to the existing bugs, we have some new bugs on the rise: >>> >>> 1) https://bugs.launchpad.net/bugs/1257626 >>> Title: Timeout while waiting on RPC response - topic: "network", RPC >>> method: "allocate_for_instance" info: "<unknown>" >>> Project: nova >>> Hits >>> FAILURE: 52 >>> large-ops only bug. This has been around for at least two weeks, but >>> we have seen this in higher numbers starting around December 3rd. This >>> may be an infrastructure issue as the neutron-large-ops started >>> failing more around the same time. >>> >>> 2) https://bugs.launchpad.net/bugs/1257641 >>> Title: Quota exceeded for instances: Requested 1, but already used 10 >>> of 10 instances >>> Projects: nova, tempest >>> Hits >>> FAILURE: 41 >>> Like the previous bug, this has been around for at least two weeks but >>> appears to be on the rise. >>> >>> >>> >>> Raw Data: http://paste.openstack.org/show/54419/ >>> >>> >>> best, >>> Joe >>> >>> >>> [0] failure rate = 1-(success rate gate-tempest-dsvm-neutron)*(success >>> rate ...) * ... >>> >>> gate-tempest-dsvm-neutron = 0.00 >>> gate-tempest-dsvm-neutron-large-ops = 11.11 >>> gate-tempest-dsvm-full = 11.11 >>> gate-tempest-dsvm-large-ops = 4.55 >>> gate-tempest-dsvm-postgres-full = 10.00 >>> gate-grenade-dsvm = 0.00 >>> >>> (I hope I got the math right here) >>> >>> [1] >>> >>> http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/elastic_recheck/cmd/check_success.py >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> Let's add bug 1257644 [1] to the list. I'm pretty sure this is due to some >> recent code [2][3] in the nova libvirt driver that is automatically >> disabling the host when the libvirt connection drops. >> >> Joe said there was a known issue with libvirt connection failures so this >> could be duped against that, but I'm not sure where/what that one is - maybe >> bug 1254872 [4]? >> >> Unless I just don't understand the code, there is some funny logic going on >> in the libvirt driver when it's automatically disabling a host which I've >> documented in bug 1257644. It would help to have some libvirt-minded people >> helping to look at that, or the authors/approvers of those patches. >> >> Also, does anyone know if libvirt will pass a 'reason' string to the >> _close_callback function? I was digging through the libvirt code this >> morning but couldn't figure out where the callback is actually called and >> with what parameters. The code in nova seemed to just be based on the patch >> that danpb had in libvirt [5]. >> >> This bug is going to raise a bigger long-term question about the need for >> having a new column in the Service table for indicating whether or not the >> service was automatically disabled, as Phil Day points out in bug 1250049 >> [6]. That way the ComputeFilter in the scheduler could handle that case a >> bit differently, at least from a logging/serviceability standpoint, e.g. >> info/warning level message vs debug. >> >> [1] https://bugs.launchpad.net/nova/+bug/1257644 >> [2] https://review.openstack.org/#/c/52189/ >> [3] https://review.openstack.org/#/c/56224/ >> [4] https://bugs.launchpad.net/nova/+bug/1254872 >> [5] http://www.redhat.com/archives/libvir-list/2012-July/msg01675.html >> [6] https://bugs.launchpad.net/nova/+bug/1250049 >> >> -- >> >> Thanks, >> >> Matt Riedemann >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > -- > Davanum Srinivas :: http://davanum.wordpress.com -- Davanum Srinivas :: http://davanum.wordpress.com _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev