Ciao Salvatore,

thanks a lot for analyzing the failures!
This link is not working for me:
7) https://bugs.launchpad.net/neutron/+bug/1253533

I took a minor bug that was not assigned. Most of the bugs are assigned to you, I was wondering if you´d use some help. I guess we can coordinate better when you are online.
cheers,

Rossella

On 02/23/2014 03:14 AM, Salvatore Orlando wrote:
I have tried to collect more information on neutron full job failures.

So far there have been 219 failures and 891 successes, for an overall success rate of 19.8% which is inline with Sean's evaluation. The count has performed exclusively on jobs executed against master branch. The failure rate for stable/havana is higher; indeed the job there still triggers bug 1273386 as it performs nbd mounting, and several fixes for the l2/l3 agents were not backported (or not backportable).
It is worth noting that actually some of the failures were because of 
infra issues. Unfortunately, it is not obvious to me how to define a 
logstash query for that. Nevertheless, it will be better to err on the 
side of safety and estimate failure rate to be about 20%.
I did then a classification of 63 failures, finding out the following:
- 25 failures were for infra issues, 1 failure was due to a flaw in a patch, leaving 37 "real" failures to analyse * In the same timeframe 203 jobs succeeded, giving a potential failure rate after excluding infra issues of 15.7%
- 2 bugs were responsible for 25 of these 37 failures
* they are the "SSH protocol banner issue", and the well-knows DB lock timeouts - bug 1253896 (the infamous SSH timeout bug) was hit only twice. The elastic recheck count is much higher because failures for the SSH protocol banner error (1265495) are being classified as bug 1253896. * actually in the past 48 hours only 2 voting neutron jobs hit this failure. This is probably a great improvement compared with a few weeks ago. - Some failures are due to bug already known and tracked, other failures are due to bugs either unforeseen so far or not tracked. In the latter case a bug report has been filed.
It seems therefore that there are two high priority bugs to address:
1) https://bugs.launchpad.net/neutron/+bug/1283522 (16 occurrences, 43.2% of failure, 6.67% globally) * Check whether we can resume the split between API server and RPC server discussion) 2) https://bugs.launchpad.net/neutron/+bug/1265495 (9/37 = 24.3% of failures, 3.75% globally)
And several minor bugs (affecting tempest and/or neutron)
Each one of the following bugs was found no more than twice in our analysis: 3) https://bugs.launchpad.net/neutron/+bug/1254890 (possibly a nova bug, but it hit the neutron full job once)
4) https://bugs.launchpad.net/neutron/+bug/1283599
5) https://bugs.launchpad.net/neutron/+bug/1277439
6) https://bugs.launchpad.net/neutron/+bug/1253896
7) https://bugs.launchpad.net/neutron/+bug/1253533
8) https://bugs.launchpad.net/tempest/+bug/1283535 (possibly not a neutron bug) 9) https://bugs.launchpad.net/tempest/+bug/1253993 (need to devise new solutions for improving agent loop times) * there is already a patch under review for bulking device details requests
10) https://bugs.launchpad.net/neutron/+bug/1283518

In my humble opinion, it is therefore important to have immediately a plan for ensuring bugs #1 and #2 are solved or at least consistently mitigated by icehouse. It would also be good to identify assignees for bug #3 to bug #10.
Regards,
Salvatore


On 21 February 2014 14:44, Sean Dague <s...@dague.net <mailto:s...@dague.net>> wrote:
    Yesterday during the QA meeting we realized that the neutron full job,
    which includes tenant isolation, and full parallelism, was passing
    quite
    often in the experimental queue. Which was actually news to most
    of us,
    as no one had been keeping a close eye on it.

    I moved that to a non-voting job on all projects. A spot check
    overnight
    is that it's failing about twice as often as the regular neutron job.
    Which is too high a failure rate to make it voting, but it's close.

    This would be the time for a final hard push by the neutron team
    to get
    to the bottom of these failures to bring the pass rate to the level of
    the existing neutron job, then we could make neutron full voting.

    This is a *huge* move forward from where things were at the Havana
    summit. I want to thank the Neutron team for getting so aggressive
    about
    getting this testing working. I was skeptical we could get there
    within
    the cycle, but a last push could actually get us neutron parity in the
    gate by i3.

            -Sean

    --
    Sean Dague
    Samsung Research America
    s...@dague.net <mailto:s...@dague.net> / sean.da...@samsung.com
    <mailto:sean.da...@samsung.com>
    http://dague.net


    _______________________________________________
    OpenStack-dev mailing list
    OpenStack-dev@lists.openstack.org
    <mailto:OpenStack-dev@lists.openstack.org>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to