On 8 July 2015 at 14:30, Salvatore Orlando <sorla...@nicira.com> wrote:
> I agree and I would make the switch as soon as possible. The graphite > graph you posted showed that since 6/28 the difference in failure rate is > such that isn't even statistically significant. However, spikes in failure > rates of the unstable job also suggest that you're starting to chase a > moving target, and we know how painful this is from the experience we had > when enabling the neutron full job. > The spike was infrastructure failure-induced, but generally speaking I agree with you. > > Salvatore > > > > On 8 July 2015 at 20:21, Armando M. <arma...@gmail.com> wrote: > >> Hi, >> >> Another brief update on the matter: >> >> Failure rate trends [1] are showing that unstable (w/ multiple API >> workers + pymysql driver) and stable configurations (w/o) are virtually >> aligned and I am proposing that it is time to drop the unstable infra >> configuration [2,3] that allowed the team to triage/experiment and get to a >> solution. I'll watch [1] a little longer before I think it's safe to >> claim that we're out of the woods. >> >> Cheers, >> Armando >> >> [1] http://goo.gl/YM7gUC >> [2] https://review.openstack.org/#/c/199668/ >> [3] https://review.openstack.org/#/c/199672/ >> >> On 22 June 2015 at 14:10, Armando M. <arma...@gmail.com> wrote: >> >>> Hi, >>> >>> A brief update on the issue that sparked this thread: >>> >>> A little over a week ago, bug [1] was filed. The gist of that was that >>> the switch to pymysql unveiled a number of latent race conditions that made >>> Neutron unstable. >>> >>> To try and nip these in the bud, the Neutron team filed a number of >>> patches [2], to create an unstable configuration that would allow them to >>> troubleshoot and experiment a solution, by still keeping the stability in >>> check (a preliminary proposal for a fix has been available in [4]). >>> >>> The latest failure rate trend is shown in [3]; as you can see, we're >>> still gathering data, but it seems that the instability gap between the two >>> jobs (stable vs unstable) has widened, and should give us plenty of data >>> points to devise a resolution strategy. >>> >>> I have documented the most recurrent traces in the bug report [1]. >>> >>> Will update once we managed to get the two curves to kiss each other >>> again and close to a more acceptable failure rate. >>> >>> Cheers, >>> Armando >>> >>> [1] https://bugs.launchpad.net/neutron/+bug/1464612 >>> [2] https://review.openstack.org/#/q/topic:neutron-unstable,n,z >>> [3] http://goo.gl/YM7gUC >>> [4] https://review.openstack.org/#/c/191540/ >>> >>> >>> On 12 June 2015 at 11:13, Boris Pavlovic <bpavlo...@mirantis.com> wrote: >>> >>>> Sean, >>>> >>>> Thanks for quick fix/revert https://review.openstack.org/#/c/191010/ >>>> This unblocked Rally gates... >>>> >>>> Best regards, >>>> Boris Pavlovic >>>> >>>> On Fri, Jun 12, 2015 at 8:56 PM, Clint Byrum <cl...@fewbar.com> wrote: >>>> >>>>> Excerpts from Mike Bayer's message of 2015-06-12 09:42:42 -0700: >>>>> > >>>>> > On 6/12/15 11:37 AM, Mike Bayer wrote: >>>>> > > >>>>> > > >>>>> > > On 6/11/15 9:32 PM, Eugene Nikanorov wrote: >>>>> > >> Hi neutrons, >>>>> > >> >>>>> > >> I'd like to draw your attention to an issue discovered by rally >>>>> gate job: >>>>> > >> >>>>> http://logs.openstack.org/96/190796/4/check/gate-rally-dsvm-neutron-rally/7a18e43/logs/screen-q-svc.txt.gz?level=TRACE >>>>> > >> >>>>> > >> I don't have bandwidth to take a deep look at it, but first >>>>> > >> impression is that it is some issue with nested transaction >>>>> support >>>>> > >> either on sqlalchemy or pymysql side. >>>>> > >> Also, besides errors with nested transactions, there are a lot of >>>>> > >> Lock wait timeouts. >>>>> > >> >>>>> > >> I think it makes sense to start with reverting the patch that >>>>> moves >>>>> > >> to pymysql. >>>>> > > My immediate reaction is that this is perhaps a concurrency-related >>>>> > > issue; because PyMySQL is pure python and allows for full blown >>>>> > > eventlet monkeypatching, I wonder if somehow the same PyMySQL >>>>> > > connection is being used in multiple contexts. E.g. one greenlet >>>>> > > starts up a savepoint, using identifier "_3" which is based on a >>>>> > > counter that is local to the SQLAlchemy Connection, but then >>>>> another >>>>> > > greenlet shares that PyMySQL connection somehow with another >>>>> > > SQLAlchemy Connection that uses the same identifier. >>>>> > >>>>> > reading more of the log, it seems the main issue is just that >>>>> there's a >>>>> > deadlock on inserting into the securitygroups table. The deadlock on >>>>> > insert can be because of an index being locked. >>>>> > >>>>> > >>>>> > I'd be curious to know how many greenlets are running concurrently >>>>> here, >>>>> > and what the overall transaction looks like within the operation >>>>> that is >>>>> > failing here (e.g. does each transaction insert multiple rows into >>>>> > securitygroups? that would make a deadlock seem more likely). >>>>> >>>>> This begs two questions: >>>>> >>>>> 1) Are we handling deadlocks with retries? It's important that we do >>>>> that to be defensive. >>>>> >>>>> 2) Are we being careful to sort the table order in any multi-table >>>>> transactions so that we minimize the chance of deadlocks happening >>>>> because of any cross table deadlocks? >>>>> >>>>> >>>>> __________________________________________________________________________ >>>>> OpenStack Development Mailing List (not for usage questions) >>>>> Unsubscribe: >>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>> >>>> >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: >>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> >>> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev