Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

Jay Pipes Thu, 20 Mar 2014 16:19:22 -0700

On Fri, 2014-03-21 at 01:02 +0200, Alexei Kornienko wrote:
> On 03/21/2014 12:53 AM, Jay Pipes wrote:
> > On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:
> >> On 03/21/2014 12:15 AM, Jay Pipes wrote:
> >>> On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
> >>>> Hello,
> >>>>
> >>>> We've done some profiling and results are quite interesting:
> >>>> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> >>>> record_metering_data)
> >>>> this calls resulted in total 2591573 SQL queries.
> >>> Yes, this matches my own experience with Ceilo+MySQL. But do not assume
> >>> that there are 2591573/59755 or around 43 queries per record meter
> >>> event. That is misleading. In fact, the number of queries per record
> >>> meter event increases over time, as the number of retries climbs due to
> >>> contention between readers and writers.
> >>>
> >>>> And the most interesting part is that 291569 queries were ROLLBACK
> >>>> queries.
> >>> Yep, I noted that as well. But, this is not unique to Ceilometer by any
> >>> means. Just take a look at any database serving Nova, Cinder, Glance, or
> >>> anything that uses the common SQLAlchemy code. You will see a huge
> >>> percentage of entire number of queries taken up by ROLLBACK statements.
> >>> The problem in Ceilometer is just that the write:read ratio is much
> >>> higher than any of the other projects.
> >>>
> >>> I had a suspicion that the rollbacks have to do with the way that the
> >>> oslo.db retry logic works, but I never had a chance to investigate it
> >>> further. Would be really interested to see similar stats against
> >>> PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
> >>> it is).
> >> Rollbacks are caused not by retry logic but by create_or_update logic:
> >> We first try to do INSERT in sub-transaction when it fails we rollback
> >> this transaction and do update instead.
> > No, that isn't correct, AFAIK. We first do a SELECT into the table and
> > then if no result, try an insert:
> >
> > https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292
> >
> > The problem, IMO, is twofold. There does not need to be nested
> > transactional containers around these create_or_update lookups -- i.e.
> > the lookups can be done outside of the main transaction begin here:
> >
> > https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335
> I'm afraid you are wrong here:
> 
> nested  =  session.connection().dialect.name  !=  'sqlite' # always True for 
> MySQL
> if  not  nested  and  session.query(model_class).get(str(_id)): # always False
> 
> Short circuit is used and no select is ever performed in MySQL.


Doh, true enough! /me wonders why this is written like so (only for
sqlite...?)

> > Secondly, given the volume of inserts (that also generate selects), a
> > simple memcache lookup cache would be highly beneficial in cutting down
> > on writer/reader contention in MySQL.
> You are right but I'm afraid that adding memcache will make deployment 
> more complicated.
> >
> > These are things that can be done without changing the schema (which has
> > other issues that can be looked at of course).
> >
> > Best,
> > -jay
> >
> >> This is caused by poorly designed schema that requires such hacks.
> >> Cause of this I suspect that we'll have similar results for PostgreSQL.
> >>
> >> Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if
> >> there is any difference.
> >>
> >>> Best,
> >>> -jay
> >>>
> >>>> We do around 5 rollbacks to record a single event!
> >>>>
> >>>> I guess it means that MySQL backend is currently totally unusable in
> >>>> production environment.
> >>>>
> >>>> Please find a full profiling graph attached.
> >>>>
> >>>> Regards,
> >>>>
> >>>> On 03/20/2014 10:31 PM, Sean Dague wrote:
> >>>>
> >>>>> On 03/20/2014 01:01 PM, David Kranz wrote:
> >>>>>> On 03/20/2014 12:31 PM, Sean Dague wrote:
> >>>>>>> On 03/20/2014 11:35 AM, David Kranz wrote:
> >>>>>>>> On 03/20/2014 06:15 AM, Sean Dague wrote:
> >>>>>>>>> On 03/20/2014 05:49 AM, Nadya Privalova wrote:
> >>>>>>>>>> Hi all,
> >>>>>>>>>> First of all, thanks for your suggestions!
> >>>>>>>>>>
> >>>>>>>>>> To summarize the discussions here:
> >>>>>>>>>> 1. We are not going to install Mongo (because "is's wrong" ?)
> >>>>>>>>> We are not going to install Mongo "not from base distribution", 
> >>>>>>>>> because
> >>>>>>>>> we don't do that for things that aren't python. Our assumption is
> >>>>>>>>> dependent services come from the base OS.
> >>>>>>>>>
> >>>>>>>>> That being said, being an integrated project means you have to be 
> >>>>>>>>> able
> >>>>>>>>> to function, sanely, on an sqla backend, as that will always be 
> >>>>>>>>> part of
> >>>>>>>>> your gate.
> >>>>>>>> This is a claim I think needs a bit more scrutiny if by "sanely" you
> >>>>>>>> mean "performant". It seems we have an integrated project that no one
> >>>>>>>> would deploy using the sql db driver we have in the gate. Is any one
> >>>>>>>> doing that?  Is having a scalable sql back end a goal of ceilometer?
> >>>>>>>>
> >>>>>>>> More generally, if there is functionality that is of great 
> >>>>>>>> importance to
> >>>>>>>> any cloud deployment (and we would not integrate it if we didn't 
> >>>>>>>> think
> >>>>>>>> it was) that cannot be deployed at scale using sqla, are we really 
> >>>>>>>> going
> >>>>>>>> to say it should not be a part of OpenStack because we refuse, for
> >>>>>>>> whatever reason, to run it in our gate using a driver that would
> >>>>>>>> actually be used? And if we do demand an sqla backend, how much time
> >>>>>>>> should we spend trying to optimize it if no one will really use it?
> >>>>>>>> Though the slow heat job is a little different because the slowness
> >>>>>>>> comes directly from running real use cases, perhaps we should just 
> >>>>>>>> set
> >>>>>>>> up a "slow ceilometer" job if the sql version is too slow for its 
> >>>>>>>> budget
> >>>>>>>> in the main job.
> >>>>>>>>
> >>>>>>>> It seems like there is a similar thread, at least in part, about this
> >>>>>>>> around marconi.
> >>>>>>> We required a non mongo backend to graduate ceilometer. So I don't 
> >>>>>>> think
> >>>>>>> it's too much to ask that it actually works.
> >>>>>>>
> >>>>>>> If the answer is that it will never work and it was a checkbox with no
> >>>>>>> intent to make it work, then it should be deprecated and removed from
> >>>>>>> the tree in Juno, with a big WARNING that you shouldn't ever use that
> >>>>>>> backend. Like Nova now does with all the virt drivers that aren't 
> >>>>>>> tested
> >>>>>>> upstream.
> >>>>>>>
> >>>>>>> Shipping in tree code that you don't want people to use is bad for
> >>>>>>> users. Either commit to making it work, or deprecate it and remove it.
> >>>>>>>
> >>>>>>> I don't see this as the same issue as the slow heat job. Heat,
> >>>>>>> architecturally, is going to be slow. It spins up real OSes and does
> >>>>>>> real thinks to them. There is no way that's ever going to be fast, and
> >>>>>>> the dedicated job was a recognition that to support this level of
> >>>>>>> services in OpenStack we need to give them more breathing room.
> >>>>>> Peace. I specifically noted that difference in my original comment. And
> >>>>>> for that reason the heat slow job may not be temporary.
> >>>>>>> Architecturally Ceilometer should not be this expensive. We've got 
> >>>>>>> some
> >>>>>>> data showing it to be aberrant from where we believe it should be. We
> >>>>>>> should fix that.
> >>>>>> There are plenty of cases where we have had code that passes gate tests
> >>>>>> with acceptable performance but falls over in real deployment. I'm just
> >>>>>> saying that having a driver that works ok in the gate but does not work
> >>>>>> for real deployments is of no more value that not having it at all.
> >>>>>> Maybe less value.
> >>>>>> How do you propose to solve the problem of getting more ceilometer 
> >>>>>> tests
> >>>>>> into the gate in the short-run? As a practical measure l don't see why
> >>>>>> it is so bad to have a separate job until the complex issue of whether
> >>>>>> it is possible to have a real-world performant sqla backend is 
> >>>>>> resolved.
> >>>>>> Or did I miss something and it has already been determined that sqla
> >>>>>> could be used for large-scale deployments if we just fixed our code?
> >>>>> I think right now the ball is back in the ceilometer court to do some
> >>>>> performance profiling, and lets see what comes of that. I don't think
> >>>>> we're getting more test before the release in any real way.
> >>>>>
> >>>>>>> Once we get a base OS in the gate that lets us direct install mongo 
> >>>>>>> from
> >>>>>>> base packages, we can also do that. Or someone can 3rd party it today.
> >>>>>>> Then we'll even have comparative results to understand the 
> >>>>>>> differences.
> >>>>>> Yes. Do you know which base OS's are candidates for that?
> >>>>> Ubuntu 14.04 will have a sufficient level of Mongo, so some time in the
> >>>>> Juno cycle we should have it in the gate.
> >>>>>
> >>>>>         -Sean
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> OpenStack-dev mailing list
> >>>>> OpenStack-dev@lists.openstack.org
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>> _______________________________________________
> >>>> OpenStack-dev mailing list
> >>>> OpenStack-dev@lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev@lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

Reply via email to