-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 14/07/14 22:48, Vishvananda Ishaya wrote: > > On Jul 13, 2014, at 9:29 AM, Ihar Hrachyshka <ihrac...@redhat.com> > wrote: > >> Signed PGP part On 12/07/14 03:17, Mike Bayer wrote: >>> >>> On 7/11/14, 7:26 PM, Carl Baldwin wrote: >>>> >>>> >>>> On Jul 11, 2014 5:32 PM, "Vishvananda Ishaya" >>>> <vishvana...@gmail.com >>> <mailto:vishvana...@gmail.com>> wrote: >>>>> >>>>> I have tried using pymysql in place of mysqldb and in real >>>>> world >>> concurrency >>>>> tests against cinder and nova it performs slower. I was >>>>> inspired by >>> the mention >>>>> of mysql-connector so I just tried that option instead. >>> Mysql-connector seems >>>>> to be slightly slower as well, which leads me to believe >>>>> that the >>> blocking inside of >>>> >>>> Do you have some numbers? "Seems to be slightly slower" >>>> doesn't >>> really stand up as an argument against the numbers that have >>> been posted in this thread. > > Numbers are highly dependent on a number of other factors, but I > was seeing 100 concurrent list commands against cinder going from > an average of 400 ms to an average of around 600 ms with both > msql-connector and pymsql.
I've made my tests on neutron only, so there is possibility that cinder works somehow differently. But, those numbers don't tell a lot in terms of considering the switch. Do you have numbers for mysqldb case? > > It is also worth mentioning that my test of 100 concurrent creates > from the same project in cinder leads to average response times > over 3 seconds. Note that creates return before the request is sent > to the node for processing, so this is just the api creating the db > record and sticking a message on the queue. A huge part of the > slowdown is in quota reservation processing which does a row lock > on the project id. Again, are those 3 seconds better or worse than what we have for mysqldb? > > Before we are sure that an eventlet friendly backend “gets rid of > all deadlocks”, I will mention that trying this test against > connector leads to some requests timing out at our load balancer (5 > minute timeout), so we may actually be introducing deadlocks where > the retry_on_deadlock operator is used. Deadlocks != timeouts. I attempt to fix eventlet-triggered db deadlocks, not all possible deadlocks that you may envision, or timeouts. > > Consider the above anecdotal for the moment, since I can’t verify > for sure that switching the sql driver didn’t introduce some other > race or unrelated problem. > > Let me just caution that we can’t recommend replacing our mysql > backend without real performance and load testing. I agree. Not saying that the tests are somehow complete, but here is what I was into last two days. There is a nice openstack project called Rally that is designed to allow easy benchmarks for openstack projects. They have four scenarios for neutron implemented: for networks, ports, routers, and subnets. Each scenario combines create and list commands. I've run each test with the following runner settings: times = 100, concurrency = 10, meaning each scenario is run 100 times in parallel, and there were not more than 10 parallel scenarios running. Then I've repeated the same for times = 100, concurrency = 20 (also set max_pool_size to 20 to allow sqlalchemy utilize that level of parallelism), and times = 1000, concurrency = 100 (same note on sqlalchemy parallelism). You can find detailed html files with nice graphs here [1]. Brief description of results is below: 1. create_and_list_networks scenario: for 10 parallel workers performance boost is -12.5% from original time, for 20 workers -6.3%, for 100 workers there is a slight reduction of average time spent for scenario +9.4% (this is the only scenario that showed slight reduction in performance, I'll try to rerun the test tomorrow to see whether it was some discrepancy when I executed it that influenced the result). 2. create_and_list_ports scenario: for 10 parallel workers boost is - -25.8%, for 20 workers it's -9.4%, and for 100 workers it's -12.6%. 3. create_and_list_routers scenario: for 10 parallel workers boost is - -46.6% (almost half of original time), for 20 workers it's -51.7% (more than a half), for 100 workers it's -41.5%. 4. create_and_list_subnets scenario: for 10 parallel workers boost is - -26.4%, for 20 workers it's -51.1% (more than half reduction in time spent for average scenario), and for 100 workers it's -31.7%. I've tried to check how it scales till 200 parallel workers, but was hit by local file opened limits and mysql max_connection settings. I will retry my tests with limits raised tomorrow to see how it handles that huge load. Tomorrow I will also try to test new library with multiple API workers. Other than that, what are your suggestions on what to check/test? FYI: [1] contains the following directories: mysqlconnector/ mysqldb/ Each of them contains the following directories: 10-10/ - 10 parallel workers, max_pool_size = 10 (default) 20-100/ - 20 parallel workers, max_pool_size = 100 100-100/ - 100 parallel workers, max_pool_size = 100 Happy analysis! [1]: http://people.redhat.com/~ihrachys/ /Ihar > > Vish > >>>> >>>>> sqlalchemy is not the main bottleneck across projects. >>>>> >>>>> Vish >>>>> >>>>> P.S. The performanace in all cases was abysmal, so >>>>> performance work >>> definitely >>>>> needs to be done, but just the guess that replacing our >>>>> mysql >>> library is going to >>>>> solve all of our performance problems appears to be >>>>> incorrect at >>> first blush. >>>> >>>> The motivation is still mostly deadlock relief but more >>>> performance >>> work should be done. I agree with you there. I'm still >>> hopeful for some improvement from this. >>> >>> >>> To identify performance that's alleviated by async you have to >>> establish up front that IO blocking is the issue, which would >>> entail having code that's blazing fast until you start running >>> it against concurrent connections, at which point you can >>> identify via profiling that IO operations are being serialized. >>> This is a very specific issue. >>> >>> In contrast, to identify why some arbitrary openstack app is >>> slow, my bet is that async is often not the big issue. Every >>> day I look at openstack code and talk to people working on >>> things, I see many performance issues that have nothing to do >>> with concurrency, and as I detailed in my wiki page at >>> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy there >>> is a long road to cleaning up all the excessive queries, >>> hundreds of unnecessary rows and columns being pulled over the >>> network, unindexed lookups, subquery joins, hammering of >>> Python-intensive operations (often due to the nature of OS apps >>> as lots and lots of tiny API calls) that can be cached. >>> There's a clear path to tons better performance documented >>> there and most of it is not about async - which means that >>> successful async isn't going to solve all those issues. >>> >> >> Of course there is a long road to decent performance, and >> switching a library won't magically fix all out issues. But if it >> will fix deadlocks, and give 30% to 150% performance boost for >> different operations, and since the switch is almost smooth, this >> is something worth doing. >> >>> >>> >>> >>> _______________________________________________ OpenStack-dev >>> mailing list OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> >>> _______________________________________________ >> OpenStack-dev mailing list OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > _______________________________________________ OpenStack-dev > mailing list OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCgAGBQJTxauNAAoJEC5aWaUY1u57l24IAJ+1c6OGz6ArEiR32gD0PnPV Xk1d3c41UJcd+hzJ4sN7cJufdupUNHgbdS6EYZx/5u5gqyN7aWXbBO7hdPbGz/3A 0P39tGE7hcChkzAyE7EuzSGGBCwLeX1dO2guhEE65Cw3fGxODb637SuMOZV3LJGD b2Z9xq7mrAzOVCV690INeBKA0oT19K0RUGcjJVbND8f3mv/SZ46xJ6EU5F2rFL6h DrWOE5NkGCm8EsE8YABPls9KrJ9J/97an4jpFGWefBtOFKjnFjTdDDC9OFMdcM27 xvogphKxOk2u8OyKcG56XfoATCkj8ygRQtfqjmFb6dsvp7+jF+8dKyU1yw9eD2I= =Ef2T -----END PGP SIGNATURE----- _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev