Working on this independently, I created a branch with some simple performance logging around the nova-api, and individually around glance, nova.db, and nova.rpc calls. (Sorry, I only have a local copy and its on a different computer right now, and probably needs a rebase. I will rebase and publish it on GitHub tomorrow.)
With this logging, I could get some simple profiling that I found very useful. Here is a GH project with the analysis code as well as some nova-api logs I was using as input. https://github.com/markwash/nova-perflog With these tools, you can get a wall-time profile for individual requests. For example, looking at one server create request (and you can run this directly from the checkout as the logs are saved there): markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f key count avg nova.api.openstack.wsgi.POST 1 0.657 nova.db.api.instance_update 1 0.191 nova.image.show 1 0.179 nova.db.api.instance_add_security_group 1 0.082 nova.rpc.cast 1 0.059 nova.db.api.instance_get_all_by_filters 1 0.034 nova.db.api.security_group_get_by_name 2 0.029 nova.db.api.instance_create 1 0.011 nova.db.api.quota_get_all_by_project 3 0.003 nova.db.api.instance_data_get_for_project 1 0.003 key count total nova.api.openstack.wsgi 1 0.657 nova.db.api 10 0.388 nova.image 1 0.179 nova.rpc 1 0.059 All times are in seconds. The nova.rpc time is probably high since this was the first call since server restart, so the connection handshake is probably included. This is also probably 1.5 months stale. The conclusion I reached from this profiling is that we just plain overuse the db (and we might do the same in glance). For example, whenever we do updates, we actually re-retrieve the item from the database, update its dictionary, and save it. This is double the cost it needs to be. We also handle updates for data across tables inefficiently, where they could be handled in single database round trip. In particular, in the case of server listings, extensions are just rough on performance. Most extensions hit the database again at least once. This isn't really so bad, but it clearly is an area where we should improve, since these are the most frequent api queries. I just see a ton of specific performance problems that are easier to address one by one, rather than diving into a general (albeit obvious) solution such as caching. "Sandy Walsh" <sandy.wa...@rackspace.com> said: > We're doing tests to find out where the bottlenecks are, caching is the > most obvious solution, but there may be others. Tools like memcache do a > really good job of sharing memory across servers so we don't have to > reinvent the wheel or hit the db at all. > > In addition to looking into caching technologies/approaches we're gluing > together some tools for finding those bottlenecks. Our first step will > be finding them, then squashing them ... however. > > -S > > On 03/22/2012 06:25 PM, Mark Washenberger wrote: >> What problems are caching strategies supposed to solve? >> >> On the nova compute side, it seems like streamlining db access and >> api-view tables would solve any performance problems caching would >> address, while keeping the stale data management problem small. >> > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp