Ciao Salvatore,
thanks a lot for analyzing the failures!
This link is not working for me:
7) https://bugs.launchpad.net/neutron/+bug/1253533
I took a minor bug that was not assigned. Most of the bugs are assigned
to you, I was wondering if you´d use some help. I guess we can
coordinate better when you are online.
cheers,
Rossella
On 02/23/2014 03:14 AM, Salvatore Orlando wrote:
I have tried to collect more information on neutron full job failures.
So far there have been 219 failures and 891 successes, for an overall
success rate of 19.8% which is inline with Sean's evaluation.
The count has performed exclusively on jobs executed against master
branch. The failure rate for stable/havana is higher; indeed the job
there still triggers bug 1273386 as it performs nbd mounting, and
several fixes for the l2/l3 agents were not backported (or not
backportable).
It is worth noting that actually some of the failures were because of
infra issues. Unfortunately, it is not obvious to me how to define a
logstash query for that. Nevertheless, it will be better to err on the
side of safety and estimate failure rate to be about 20%.
I did then a classification of 63 failures, finding out the following:
- 25 failures were for infra issues, 1 failure was due to a flaw in a
patch, leaving 37 "real" failures to analyse
* In the same timeframe 203 jobs succeeded, giving a potential
failure rate after excluding infra issues of 15.7%
- 2 bugs were responsible for 25 of these 37 failures
* they are the "SSH protocol banner issue", and the well-knows DB
lock timeouts
- bug 1253896 (the infamous SSH timeout bug) was hit only twice. The
elastic recheck count is much higher because failures for the SSH
protocol banner error (1265495) are being classified as bug 1253896.
* actually in the past 48 hours only 2 voting neutron jobs hit this
failure. This is probably a great improvement compared with a few
weeks ago.
- Some failures are due to bug already known and tracked, other
failures are due to bugs either unforeseen so far or not tracked. In
the latter case a bug report has been filed.
It seems therefore that there are two high priority bugs to address:
1) https://bugs.launchpad.net/neutron/+bug/1283522 (16 occurrences,
43.2% of failure, 6.67% globally)
* Check whether we can resume the split between API server and RPC
server discussion)
2) https://bugs.launchpad.net/neutron/+bug/1265495 (9/37 = 24.3% of
failures, 3.75% globally)
And several minor bugs (affecting tempest and/or neutron)
Each one of the following bugs was found no more than twice in our
analysis:
3) https://bugs.launchpad.net/neutron/+bug/1254890 (possibly a nova
bug, but it hit the neutron full job once)
4) https://bugs.launchpad.net/neutron/+bug/1283599
5) https://bugs.launchpad.net/neutron/+bug/1277439
6) https://bugs.launchpad.net/neutron/+bug/1253896
7) https://bugs.launchpad.net/neutron/+bug/1253533
8) https://bugs.launchpad.net/tempest/+bug/1283535 (possibly not a
neutron bug)
9) https://bugs.launchpad.net/tempest/+bug/1253993 (need to devise new
solutions for improving agent loop times)
* there is already a patch under review for bulking device details
requests
10) https://bugs.launchpad.net/neutron/+bug/1283518
In my humble opinion, it is therefore important to have immediately a
plan for ensuring bugs #1 and #2 are solved or at least consistently
mitigated by icehouse. It would also be good to identify assignees for
bug #3 to bug #10.
Regards,
Salvatore
On 21 February 2014 14:44, Sean Dague <s...@dague.net
<mailto:s...@dague.net>> wrote:
Yesterday during the QA meeting we realized that the neutron full job,
which includes tenant isolation, and full parallelism, was passing
quite
often in the experimental queue. Which was actually news to most
of us,
as no one had been keeping a close eye on it.
I moved that to a non-voting job on all projects. A spot check
overnight
is that it's failing about twice as often as the regular neutron job.
Which is too high a failure rate to make it voting, but it's close.
This would be the time for a final hard push by the neutron team
to get
to the bottom of these failures to bring the pass rate to the level of
the existing neutron job, then we could make neutron full voting.
This is a *huge* move forward from where things were at the Havana
summit. I want to thank the Neutron team for getting so aggressive
about
getting this testing working. I was skeptical we could get there
within
the cycle, but a last push could actually get us neutron parity in the
gate by i3.
-Sean
--
Sean Dague
Samsung Research America
s...@dague.net <mailto:s...@dague.net> / sean.da...@samsung.com
<mailto:sean.da...@samsung.com>
http://dague.net
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev