-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 16/07/14 01:50, Vishvananda Ishaya wrote: > > On Jul 15, 2014, at 3:30 PM, Ihar Hrachyshka <ihrac...@redhat.com> > wrote: > >> Signed PGP part On 14/07/14 22:48, Vishvananda Ishaya wrote: >>> >>> On Jul 13, 2014, at 9:29 AM, Ihar Hrachyshka >>> <ihrac...@redhat.com> wrote: >>> >>>> Signed PGP part On 12/07/14 03:17, Mike Bayer wrote: >>>>> >>>>> On 7/11/14, 7:26 PM, Carl Baldwin wrote: >>>>>> >>>>>> >>>>>> On Jul 11, 2014 5:32 PM, "Vishvananda Ishaya" >>>>>> <vishvana...@gmail.com >>>>> <mailto:vishvana...@gmail.com>> wrote: >>>>>>> >>>>>>> I have tried using pymysql in place of mysqldb and in >>>>>>> real world >>>>> concurrency >>>>>>> tests against cinder and nova it performs slower. I >>>>>>> was inspired by >>>>> the mention >>>>>>> of mysql-connector so I just tried that option >>>>>>> instead. >>>>> Mysql-connector seems >>>>>>> to be slightly slower as well, which leads me to >>>>>>> believe that the >>>>> blocking inside of >>>>>> >>>>>> Do you have some numbers? "Seems to be slightly slower" >>>>>> doesn't >>>>> really stand up as an argument against the numbers that >>>>> have been posted in this thread. >>> >>> Numbers are highly dependent on a number of other factors, but >>> I was seeing 100 concurrent list commands against cinder going >>> from an average of 400 ms to an average of around 600 ms with >>> both msql-connector and pymsql. >> >> I've made my tests on neutron only, so there is possibility that >> cinder works somehow differently. >> >> But, those numbers don't tell a lot in terms of considering the >> switch. Do you have numbers for mysqldb case? > > Sorry if my commentary above was unclear. The 400ms is mysqldb. > The 600ms average was the same for both the other options. >> >>> >>> It is also worth mentioning that my test of 100 concurrent >>> creates from the same project in cinder leads to average >>> response times over 3 seconds. Note that creates return before >>> the request is sent to the node for processing, so this is just >>> the api creating the db record and sticking a message on the >>> queue. A huge part of the slowdown is in quota reservation >>> processing which does a row lock on the project id. >> >> Again, are those 3 seconds better or worse than what we have for >> mysqldb? > > The 3 seconds is from mysqldb. I don?t have average response times > for mysql-connector due to the timeouts I mention below. >> >>> >>> Before we are sure that an eventlet friendly backend ?gets rid >>> of all deadlocks?, I will mention that trying this test >>> against connector leads to some requests timing out at our load >>> balancer (5 minute timeout), so we may actually be introducing >>> deadlocks where the retry_on_deadlock operator is used. >> >> Deadlocks != timeouts. I attempt to fix eventlet-triggered db >> deadlocks, not all possible deadlocks that you may envision, or >> timeouts. > > That may be true, but if switching the default is trading one > problem for another it isn?t necessarily the right fix. The timeout > means that one or more greenthreads are never actually generating a > response. I suspect and endless retry_on_deadlock between a couple > of competing greenthreads which we don?t hit with mysqldb, but it > could be any number of things. > >> >>> >>> Consider the above anecdotal for the moment, since I can?t >>> verify for sure that switching the sql driver didn?t introduce >>> some other race or unrelated problem. >>> >>> Let me just caution that we can?t recommend replacing our >>> mysql backend without real performance and load testing. >> >> I agree. Not saying that the tests are somehow complete, but here >> is what I was into last two days. >> >> There is a nice openstack project called Rally that is designed >> to allow easy benchmarks for openstack projects. They have four >> scenarios for neutron implemented: for networks, ports, routers, >> and subnets. Each scenario combines create and list commands. >> >> I've run each test with the following runner settings: times = >> 100, concurrency = 10, meaning each scenario is run 100 times in >> parallel, and there were not more than 10 parallel scenarios >> running. Then I've repeated the same for times = 100, concurrency >> = 20 (also set max_pool_size to 20 to allow sqlalchemy utilize >> that level of parallelism), and times = 1000, concurrency = 100 >> (same note on sqlalchemy parallelism). >> >> You can find detailed html files with nice graphs here [1]. >> Brief description of results is below: >> >> 1. create_and_list_networks scenario: for 10 parallel workers >> performance boost is -12.5% from original time, for 20 workers >> -6.3%, for 100 workers there is a slight reduction of average >> time spent for scenario +9.4% (this is the only scenario that >> showed slight reduction in performance, I'll try to rerun the >> test tomorrow to see whether it was some discrepancy when I >> executed it that influenced the result). >> >> 2. create_and_list_ports scenario: for 10 parallel workers boost >> is -25.8%, for 20 workers it's -9.4%, and for 100 workers it's >> -12.6%. >> >> 3. create_and_list_routers scenario: for 10 parallel workers >> boost is -46.6% (almost half of original time), for 20 workers >> it's -51.7% (more than a half), for 100 workers it's -41.5%. >> >> 4. create_and_list_subnets scenario: for 10 parallel workers >> boost is -26.4%, for 20 workers it's -51.1% (more than half >> reduction in time spent for average scenario), and for 100 >> workers it's -31.7%. >> >> I've tried to check how it scales till 200 parallel workers, but >> was hit by local file opened limits and mysql max_connection >> settings. I will retry my tests with limits raised tomorrow to >> see how it handles that huge load. >> >> Tomorrow I will also try to test new library with multiple API >> workers. >> >> Other than that, what are your suggestions on what to >> check/test? > > Testing other projects in addition seems very important. Make sure > that improving neutron isn?t going to kill cinder and nova. >
A bit of updates. So today I've tried to run Cinder scenarios available in Rally with both drivers. 1. CinderVolumes.create_and_delete_volume scenario: - - with 2 parallel workers: 21.712 sec (mysqldb), 22.881 sec (mysqlconnector), performance drop = -5.4% - - with 5 parallel workers: 53.549 sec (mysqldb), 50.948 sec (mysqlconnector), performance gain = +4.9% - - with 10 parallel workers: 114.664 sec (mysqldb), 109.838 sec (mysqlconnector), performance gain = +4.2% 2. CinderVolumes.create_and_list_volume scenario: - - with 2 parallel workers: 2.554 sec (mysqldb), 2.556 sec (mysqlconnector), performance drop = 0% - - with 5 parallel workers: 2.889 sec (mysqldb), 3.107 sec (mysqlconnector), performance drop = -7.5% - - with 10 parallel workers: 9.729 sec (mysqldb), 4.395 sec (mysqlconnector), performance gain = +54.8% (sic!) 3. CinderVolumes.create_volume scenario: - - with 2 parallel workers: 2.65 sec (mysqldb), 2.48 sec (mysqlconnector), performance gain = +6.4% - - with 5 parallel workers: 2.807 sec (mysqldb), 3.341 sec (mysqlconnector), performance drop = -19% (sic!) - - with 10 parallel workers: 7.472 sec (mysqldb), 5.02 sec (mysqlconnector), performance gain = +32.8% (sic!) For some reason, results are somewhat random. There is no that clear trend for performance gains as observed in Neutron, though still, on average, mysql-connector seems to work better. Tomorrow I'll try Nova tests, and will return to Neutron testing multiple API workers there. Note: I still failed to reach MySQL Connector author. If I will fail to do so, we may reconsider other driver, like pymysql. > Vish > >> >> FYI: [1] contains the following directories: >> >> mysqlconnector/ mysqldb/ >> >> Each of them contains the following directories: 10-10/ - 10 >> parallel workers, max_pool_size = 10 (default) 20-100/ - 20 >> parallel workers, max_pool_size = 100 100-100/ - 100 parallel >> workers, max_pool_size = 100 >> >> Happy analysis! >> >> [1]: http://people.redhat.com/~ihrachys/ >> >> /Ihar >> >>> >>> Vish >>> >>>>>> >>>>>>> sqlalchemy is not the main bottleneck across projects. >>>>>>> >>>>>>> Vish >>>>>>> >>>>>>> P.S. The performanace in all cases was abysmal, so >>>>>>> performance work >>>>> definitely >>>>>>> needs to be done, but just the guess that replacing >>>>>>> our mysql >>>>> library is going to >>>>>>> solve all of our performance problems appears to be >>>>>>> incorrect at >>>>> first blush. >>>>>> >>>>>> The motivation is still mostly deadlock relief but more >>>>>> performance >>>>> work should be done. I agree with you there. I'm still >>>>> hopeful for some improvement from this. >>>>> >>>>> >>>>> To identify performance that's alleviated by async you have >>>>> to establish up front that IO blocking is the issue, which >>>>> would entail having code that's blazing fast until you >>>>> start running it against concurrent connections, at which >>>>> point you can identify via profiling that IO operations are >>>>> being serialized. This is a very specific issue. >>>>> >>>>> In contrast, to identify why some arbitrary openstack app >>>>> is slow, my bet is that async is often not the big issue. >>>>> Every day I look at openstack code and talk to people >>>>> working on things, I see many performance issues that have >>>>> nothing to do with concurrency, and as I detailed in my >>>>> wiki page at >>>>> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy >>>>> there is a long road to cleaning up all the excessive >>>>> queries, hundreds of unnecessary rows and columns being >>>>> pulled over the network, unindexed lookups, subquery joins, >>>>> hammering of Python-intensive operations (often due to the >>>>> nature of OS apps as lots and lots of tiny API calls) that >>>>> can be cached. There's a clear path to tons better >>>>> performance documented there and most of it is not about >>>>> async - which means that successful async isn't going to >>>>> solve all those issues. >>>>> >>>> >>>> Of course there is a long road to decent performance, and >>>> switching a library won't magically fix all out issues. But >>>> if it will fix deadlocks, and give 30% to 150% performance >>>> boost for different operations, and since the switch is >>>> almost smooth, this is something worth doing. >>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>> >>>> >>>> >>>>> >> >>>>> _______________________________________________ >>>> OpenStack-dev mailing list OpenStack-dev@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>>> >>> >>> >>> >>>> _______________________________________________ OpenStack-dev >>> mailing list OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> >>> _______________________________________________ >> OpenStack-dev mailing list OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > _______________________________________________ OpenStack-dev > mailing list OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCgAGBQJTxuK9AAoJEC5aWaUY1u579OgIALpx4PYenw6wwgJ0NHLm5PCn IvmSypa8O6xhGO32hfncOf0MD2riANXrCXLBx4fOihh45NInhMMJcqZ8/dPW6L9x tW8kA23bbc2Qv+eMM6qQR0LgHKWStscMteEzTbKBsInEALFzwrzWPgHG3am+RaMC 8n2Nu4SZx0GLMChyGoXihT7RDlh4QBRxFdX0eQBlEV69WaHeNi1Fha9UoWsKyPlO jVL93pbhpHt6WQo6/toMsjPTnLlmXpRG9bpLYhwUAoIGjzVps1aSK9DhXQ6XA1z1 u0LP+LnVs/ScjqoOdhPq4Ppy5xHra6IGPmlbAIzkoc3bSS9ytD+ciaVybAOCp4c= =yMTn -----END PGP SIGNATURE----- _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev