Attila Fazekas <afaze...@redhat.com> wrote:
> I see lot of improvements, > but cPython is still cPython. > > When you benchmarking query related things, please try to > get the actual data from the returned objects that goes without saying. I’ve been benching SQLAlchemy and DBAPIs for many years. New performance improvements tend to be the priority for pretty much every major release. > and try to do > something with data what is not expected to be optimized out even by > a smarter compiler. Well I tend to favor breaking out the different elements into individual tests here, though I guess if you’re trying to trick a JIT then the more composed versions may be more relevant. For example, I could already tell you that the AttributeDict thing would perform terribly without having to mix it up with the DB access. __getattr__ is a poor performer (learned that in SQLAlchemy 0.1 about 9 years ago). > Here is my play script and several numbers: > http://www.fpaste.org/193999/25585380/raw/ > Is there any faster ORM way for the same op? Absolutely, as I’ve been saying for months all the way back in my wiki entry on forward, query for individual columns, also skip the session.rollback() and do a close() instead (the transaction is still rolled back, we just skip the bookkeeping we don’t need). You get the nice attribute access pattern too: http://www.fpaste.org/194098/56040781/ def query_sqla_cols(self): "SQLAlchemy yield(100) named tuples" session = self.Session() start = time.time() summary = 0 for obj in session.query( Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100): summary += obj.id + obj.A + obj.B + obj.C session.rollback() end = time.time() return [end-start, summary] def query_sqla_cols_a3(self): "SQLAlchemy yield(100) named tuples 3*access" session = self.Session() start = time.time() summary = 0 for obj in session.query( Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100): summary += obj.id + obj.A + obj.B + obj.C summary += obj.id + obj.A + obj.B + obj.C summary += obj.id + obj.A + obj.B + obj.C session.rollback() end = time.time() return [end-start, summary/3] Here’s that: 0 SQLAlchemy yield(100) named tuples: time: 0.635045 (data [183560260000L]) 1 SQLAlchemy yield(100) named tuples: time: 0.630911 (data [183560260000L]) 2 SQLAlchemy yield(100) named tuples: time: 0.641687 (data [183560260000L]) 0 SQLAlchemy yield(100) named tuples 3*access: time: 0.807285 (data [183560260000L]) 1 SQLAlchemy yield(100) named tuples 3*access: time: 0.814160 (data [183560260000L]) 2 SQLAlchemy yield(100) named tuples 3*access: time: 0.829011 (data [183560260000L]) compared to the fastest Core test: 0 SQlAlchemy core simple: time: 0.707205 (data [183560260000L]) 1 SQlAlchemy core simple: time: 0.702223 (data [183560260000L]) 2 SQlAlchemy core simple: time: 0.708816 (data [183560260000L]) This is using 1.0’s named tuple which is faster than the one in 0.9. As I discussed in the migration notes I linked, over here http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html#new-keyedtuple-implementation-dramatically-faster is where I discuss how I came up with that named tuple approach. In 0.9, the tuples are much slower (but still faster than straight entities): 0 SQLAlchemy yield(100) named tuples: time: 1.083882 (data [183560260000L]) 1 SQLAlchemy yield(100) named tuples: time: 1.097783 (data [183560260000L]) 2 SQLAlchemy yield(100) named tuples: time: 1.113621 (data [183560260000L]) 0 SQLAlchemy yield(100) named tuples 3*access: time: 1.204280 (data [183560260000L]) 1 SQLAlchemy yield(100) named tuples 3*access: time: 1.245768 (data [183560260000L]) 2 SQLAlchemy yield(100) named tuples 3*access: time: 1.258327 (data [183560260000L]) Also note that the difference in full object fetches for 0.9 vs. 1.0 are quite different: 0.9.8: 0 SQLAlchemy yield(100): time: 2.802273 (data [183560260000L]) 1 SQLAlchemy yield(100): time: 2.778059 (data [183560260000L]) 2 SQLAlchemy yield(100): time: 2.841441 (data [183560260000L]) 1.0: 0 SQLAlchemy yield(100): time: 2.019153 (data [183560260000L]) 1 SQLAlchemy yield(100): time: 2.052810 (data [183560260000L]) 2 SQLAlchemy yield(100): time: 2.000401 (data [183560260000L]) The tests you have here are favoring row fetches over individual query time. The original speed complaints were talking about lots of queries, not as much lots of rows. The baked strategy is appropriate for the lots of queries use case. Feel free to check out the performance examples I linked which break all these out. > Looks like still worth to convert the results to dict, > when you access the data multiple times. > > dict is also the typical input type for the json serializers. > > The plain dict is good enough if you do not want to mange > which part is changed, especially when you are not planning to `save` it. > > ----- Original Message ----- >> From: "Mike Bayer" <mba...@redhat.com> >> To: "OpenStack Development Mailing List (not for usage questions)" >> <openstack-dev@lists.openstack.org> >> Sent: Wednesday, March 4, 2015 11:30:49 PM >> Subject: Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming >> features (was: [nova] blueprint about >> multiple workers) >> >> >> >> Mike Bayer <mba...@redhat.com> wrote: >> >>> Attila Fazekas <afaze...@redhat.com> wrote: >>> >>>> Hi, >>>> >>>> I wonder what is the planned future of the scheduling. >>>> >>>> The scheduler does a lot of high field number query, >>>> which is CPU expensive when you are using sqlalchemy-orm. >>>> Does anyone tried to switch those operations to sqlalchemy-core ? >>> >>> An upcoming feature in SQLAlchemy 1.0 will remove the vast majority of CPU >>> overhead from the query side of SQLAlchemy ORM by caching all the work done >> >> Just to keep the Openstack community of what’s upcoming, here’s some more >> detail >> on some of the new SQLAlchemy performance features, which are based on the >> goals I first set up last summer at >> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy. >> >> As 1.0 features a lot of new styles of doing things that are primarily in >> the name of performance, in order to help categorize and document these >> techniques, 1.0 includes a performance suite in examples/ which features a >> comprehensive collection of common database idioms run under timing and >> function-count profiling. These idioms are broken into major categories like >> “short selects”, “large resultsets”, “bulk inserts”, and serve not only as a >> way to compare the relative performance of different techniques, but also as >> a way to provide example code categorized into use cases that illustrate the >> variety of ways to achieve that case, including the tradeoffs for each, >> across Core and ORM. So in this case, we can see what the “baked” query >> looks like in the “short_selects” suite, which times how long it takes to >> perform 10000 queries, each of which return one object or row: >> >> https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/short_selects.py?at=ticket_3054#cl-73 >> >> The results of this suite look like the following: >> >> test_orm_query : test a straight ORM query of the full entity. (10000 >> iterations); total time 7.363434 sec >> test_orm_query_cols_only : test an ORM query of only the entity columns. >> (10000 iterations); total time 6.509266 sec >> test_baked_query : test a baked query of the full entity. (10000 iterations); >> total time 1.999689 sec >> test_baked_query_cols_only : test a baked query of only the entity columns. >> (10000 iterations); total time 1.990916 sec >> test_core_new_stmt_each_time : test core, creating a new statement each time. >> (10000 iterations); total time 3.842871 sec >> test_core_reuse_stmt : test core, reusing the same statement (but recompiling >> each time). (10000 iterations); total time 2.806590 sec >> test_core_reuse_stmt_compiled_cache : test core, reusing the same statement + >> compiled cache. (10000 iterations); total time 0.659902 sec >> >> Where above, “test_orm” and “test_baked” are both using the ORM API >> exclusively. We can see that the “baked” approach, returning column tuples >> is almost twice as fast as a naive Core approach, that is, one which >> constructs select() objects each time and does not attempt to use any >> compilation caching. >> >> For the use case of fetching large numbers of rows, we can look at the >> large_resultsets suite >> (https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/large_resultsets.py?at=ticket_3054). >> This suite illustrates a single query which fetches 500K rows. The “Baked” >> approach isn’t relevant here as we are only emitting a query once, however >> the approach we use to fetch rows is significant. Here we can see that >> ORM-based “tuple” approaches are very close in speed to the fetching of rows >> using Core directly. We also have a comparison of Core against raw DBAPI >> access, where we see very little speed improvement; an example where we >> create a very simple object for each DBAPI row fetched is also present to >> illustrate how quickly even the most minimal Python function overhead adds >> up when we do something 500K times. >> >> test_orm_full_objects_list : Load fully tracked ORM objects into one big >> list(). (500000 iterations); total time 11.055097 sec >> test_orm_full_objects_chunks : Load fully tracked ORM objects a chunk at a >> time using yield_per(). (500000 iterations); total time 7.323350 sec >> test_orm_bundles : Load lightweight "bundle" objects using the ORM. (500000 >> iterations); total time 2.128237 sec >> test_orm_columns : Load individual columns into named tuples using the ORM. >> (500000 iterations); total time 1.585236 sec >> test_core_fetchall : Load Core result rows using fetchall. (500000 >> iterations); total time 1.187013 sec >> test_core_fetchmany_w_streaming : Load Core result rows using >> fetchmany/streaming. (500000 iterations); total time 0.945906 sec >> test_core_fetchmany : Load Core result rows using Core / fetchmany. (500000 >> iterations); total time 0.959626 sec >> test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI fetchall(), >> generate an object for each row. (500000 iterations); total time 1.168365 >> sec >> test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't make >> any objects. (500000 iterations); total time 0.835586 sec >> >> An ongoing document of new SQLAlchemy 1.0 features, performance >> enhancements, and behavior-changing fixes is at >> http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html. >> >> >> >> >>> up until the SQL is emitted, including all the function overhead of >>> building >>> up the Query object, producing a core select() object internally from the >>> Query, working out a large part of the object fetch strategies, and finally >>> the string compilation of the select() into a string as well as organizing >>> the typing information for result columns. With a query that is constructed >>> using the “Baked” feature, all of these steps are cached in memory and held >>> persistently; the same query can then be re-used at which point all of >>> these >>> steps are skipped. The system produces the cache key based on the in-place >>> construction of the Query using lambdas so no major changes to code >>> structure are needed; just the way the Query modifications are performed >>> needs to be preceded with “lambda q:”, essentially. >>> >>> With this approach, the traditional session.query(Model) approach can go >>> from start to SQL being emitted with an order of magnitude less function >>> calls. On the fetch side, fetching individual columns instead of full >>> entities has always been an option with ORM and is about the same speed as >>> a >>> Core fetch of rows. So using ORM with minimal changes to existing ORM code >>> you can get performance even better than you’d get using Core directly, >>> since caching of the string compilation is also added. >>> >>> On the persist side, the new bulk insert / update features provide a bridge >>> from ORM-mapped objects to bulk inserts/updates without any unit of work >>> sorting going on. ORM mapped objects are still more expensive to use in >>> that >>> instantiation and state change is still more expensive, but bulk >>> insert/update accepts dictionaries as well, which again is competitive with >>> a straight Core insert. >>> >>> Both of these features are completed in the master branch, the “baked >>> query” >>> feature just needs documentation, and I’m basically two or three tickets >>> away from beta releases of 1.0. The “Baked” feature itself lives as an >>> extension and if we really wanted, I could backport it into oslo.db as well >>> so that it works against 0.9. >>> >>> So I’d ask that folks please hold off on any kind of migration from ORM to >>> Core for performance reasons. I’ve spent the past several months adding >>> features directly to SQLAlchemy that allow an ORM-based app to have routes >>> to operations that perform just as fast as that of Core without a rewrite >>> of >>> code. >>> >>>> The scheduler does lot of thing in the application, like filtering >>>> what can be done on the DB level more efficiently. Why it is not done >>>> on the DB side ? >>>> >>>> There are use cases when the scheduler would need to know even more data, >>>> Is there a plan for keeping `everything` in all schedulers process memory >>>> up-to-date ? >>>> (Maybe zookeeper) >>>> >>>> The opposite way would be to move most operation into the DB side, >>>> since the DB already knows everything. >>>> (stored procedures ?) >>>> >>>> Best Regards, >>>> Attila >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "Rui Chen" <chenrui.m...@gmail.com> >>>>> To: "OpenStack Development Mailing List (not for usage questions)" >>>>> <openstack-dev@lists.openstack.org> >>>>> Sent: Wednesday, March 4, 2015 4:51:07 AM >>>>> Subject: [openstack-dev] [nova] blueprint about multiple workers >>>>> supported in nova-scheduler >>>>> >>>>> Hi all, >>>>> >>>>> I want to make it easy to launch a bunch of scheduler processes on a >>>>> host, >>>>> multiple scheduler workers will make use of multiple processors of host >>>>> and >>>>> enhance the performance of nova-scheduler. >>>>> >>>>> I had registered a blueprint and commit a patch to implement it. >>>>> https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support >>>>> >>>>> This patch had applied in our performance environment and pass some test >>>>> cases, like: concurrent booting multiple instances, currently we didn't >>>>> find >>>>> inconsistent issue. >>>>> >>>>> IMO, nova-scheduler should been scaled horizontally on easily way, the >>>>> multiple workers should been supported as an out of box feature. >>>>> >>>>> Please feel free to discuss this feature, thanks. >>>>> >>>>> Best Regards >>>>> >>>>> >>>>> __________________________________________________________________________ >>>>> OpenStack Development Mailing List (not for usage questions) >>>>> Unsubscribe: >>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev