While we are on the topic of api performance and the database, I have a few thoughts I'd like to share.
TL;DR: - we should consider refactoring our wsgi server to leverage multiple processors - we could leverage compute-cell database responsibility separataion to speedup our api database performance by several orders of magnitude I think the main way eventlet holds us back right now is that we have such low utilization. The big jump with multiprocessing or threading would be the potential to leverage more powerful hardware. Currently nova-api probably wouldn't run any faster on bare metal than it would run on an m1.tiny. Of course, this isn't an eventlet limitation per se but rather we are limiting ourselves to eventlet single-processing performance with our wsgi server implementation. However, the greatest performance improvement I see would come from streamlining the database interactions incurred on each nova-api request. We have been pretty fast-and-loose with adding database and glance calls to the openstack api controllers and compute api. I am especially thinking of the extension mechanism, which tends to require another database call for each /servers extension a deployer chooses to enable. But, if we think in ideal terms, each api request should perform no more than 1 database call for queries, and no more than 2 db calls for commands (validation + initial creation). In addition, I can imagine an implementation where these database calls don't have any joins, and involve no more than one network roundtrip. Beyond refactoring the way we add in data for response extensions, I think the right way to get this database performance is make the compute-cells approach the "normal". In this approach, there are at least two nova databases, one which lives along with the nova-api nodes, and one that lives in a compute cell. The api database is kept up to date through asynchronous updates that bubble up from the compute cells. With this separation, we are free to tailor the schema of the api database to match api performance needs, while we tailor the schema of the compute cell database to the operational requirements of compute workers. In particular, we can completely denormalize the tables in the api database without creating unpleasant side effects in the compute manager code. This denormalization both means fewer database interactions and fewer joins (which likely matters for larger deployments). If we partner this streamlining and denormalization approach with similar attentions to glance performance and an rpc implementation that writes to disk and returns, processing network activities in the background, I think we could get most api actions to < 10 ms on reasonable hardware. As much as the initial push on compute-cells is about scale, I think it could enable major performance improvements directly on its heels during the fulsom cycle. This is something I'd love to talk about more at the conference if anyone has any interest. _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp