2018-08-17 2:44 GMT+08:00 Dan Smith <d...@danplanet.com>: > > yes, the DB query was in serial, after some investigation, it seems > that we are unable to perform eventlet.mockey_patch in uWSGI mode, so > > Yikun made this fix: > > > > https://review.openstack.org/#/c/592285/ > > Cool, good catch :) > > > > > After making this change, we test again, and we got this kind of data: > > > > total collect sort view > > before monkey_patch 13.5745 11.7012 1.1511 0.5966 > > after monkey_patch 12.8367 10.5471 1.5642 0.6041 > > > > The performance improved a little, and from the log we can saw: > > Since these all took ~1s when done in series, but now take ~10s in > parallel, I think you must be hitting some performance bottleneck in > either case, which is why the overall time barely changes. Some ideas: > > 1. In the real world, I think you really need to have 10x database > servers or at least a DB server with plenty of cores loading from a > very fast (or separate) disk in order to really ensure you're getting > full parallelism of the DB work. However, because these queries all > took ~1s in your serialized case, I expect this is not your problem. > > 2. What does the network look like between the api machine and the DB? > > 3. What do the memory and CPU usage of the api process look like while > this is happening? > > Related to #3, even though we issue the requests to the DB in parallel, > we still process the result of those calls in series in a single python > thread on the API. That means all the work of reading the data from the > socket, constructing the SQLA objects, turning those into nova objects, > etc, all happens serially. It could be that the DB query is really a > small part of the overall time and our serialized python handling of the > result is the slow part. If you see the api process pegging a single > core at 100% for ten seconds, I think that's likely what is happening. >
I remember I did a test on sqlalchemy, the sqlalchemy object construction is super slow than fetch the data from remote. Maybe you can try profile it, to figure out how much time spend on the wire, how much time spend on construct the object. http://docs.sqlalchemy.org/en/latest/faq/performance.html > > > so, now the queries are in parallel, but the whole thing still seems > > serial. > > In your table, you show the time for "1 cell, 1000 instances" as ~3s and > "10 cells, 1000 instances" as 10s. The problem with comparing those > directly is that in the latter, you're actually pulling 10,000 records > over the network, into memory, processing them, and then just returning > the first 1000 from the sort. A closer comparison would be the "10 > cells, 100 instances" with "1 cell, 1000 instances". In both of those > cases, you pull 1000 instances total from the db, into memory, and > return 1000 from the sort. In that case, the multi-cell situation is > faster (~2.3s vs. ~3.1s). You could also compare the "10 cells, 1000 > instances" case to "1 cell, 10,000 instances" just to confirm at the > larger scale that it's better or at least the same. > > We _have_ to pull $limit instances from each cell, in case (according to > the sort key) the first $limit instances are all in one cell. We _could_ > try to batch the results from each cell to avoid loading so many that we > don't need, but we punted this as an optimization to be done later. I'm > not sure it's really worth the complexity at this point, but it's > something we could investigate. > > --Dan > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev