Great suggestions guys ... we'll give some thought on how the community can share and compare performance measurements in a consistent way.
-S On 03/23/2012 07:26 PM, Joe Gordon wrote: > +1 > > Documenting these findings would be nice too. > > > best, > Joe > > On Fri, Mar 23, 2012 at 2:15 PM, Justin Santa Barbara > <jus...@fathomdb.com <mailto:jus...@fathomdb.com>> wrote: > > This is great: hard numbers are exactly what we need. I would love > to see a statement-by-statement SQL log with timings from someone > that has a performance issue. I'm happy to look into any DB > problems that demonstrates. > > The nova database is small enough that it should always be in-memory > (if you're running a million VMs, I don't think asking for one > gigabyte of RAM on your DB is unreasonable!) > > If it isn't hitting disk, PostgreSQL or MySQL with InnoDB can serve > 10k 'indexed' requests per second through SQL on a low-end (<$1000) > box. With tuning you can get 10x that. Using one of the SQL bypass > engines (e.g. MySQL HandlerSocket) can supposedly give you 10x > again. Throwing money at the problem in the form of multi-processor > boxes (or disks if you're I/O bound) can probably get you 10x again. > > However, if you put a DB on a remote host, you'll have to wait for a > network round-trip per query. If your ORM is doing a 1+N query, the > total read time will be slow. If your DB is doing a sync on every > write, writes will be slow. If the DB isn't tuned with a sensible > amount of cache (at least as big as the DB size), it will be > slow(er). Each of these has a very simple fix for OpenStack. > > Relational databases have very efficient caching mechanisms built > in. Any out-of-process cache will have a hard time beating it. > Let's make sure the bottleneck is the DB, and not (for example) > RabbitMQ, before we go off a huge rearchitecture. > > Justin > > > > > On Thu, Mar 22, 2012 at 7:53 PM, Mark Washenberger > <mark.washenber...@rackspace.com > <mailto:mark.washenber...@rackspace.com>> wrote: > > Working on this independently, I created a branch with some simple > performance logging around the nova-api, and individually around > glance, nova.db, and nova.rpc calls. (Sorry, I only have a local > copy and its on a different computer right now, and probably needs > a rebase. I will rebase and publish it on GitHub tomorrow.) > > With this logging, I could get some simple profiling that I found > very useful. Here is a GH project with the analysis code as well > as some nova-api logs I was using as input. > > https://github.com/markwash/nova-perflog > > With these tools, you can get a wall-time profile for individual > requests. For example, looking at one server create request (and > you can run this directly from the checkout as the logs are saved > there): > > markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python > profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f > key count avg > nova.api.openstack.wsgi.POST 1 0.657 > nova.db.api.instance_update 1 0.191 > nova.image.show 1 0.179 > nova.db.api.instance_add_security_group 1 0.082 > nova.rpc.cast 1 0.059 > nova.db.api.instance_get_all_by_filters 1 0.034 > nova.db.api.security_group_get_by_name 2 0.029 > nova.db.api.instance_create 1 0.011 > nova.db.api.quota_get_all_by_project 3 0.003 > nova.db.api.instance_data_get_for_project 1 0.003 > > key count total > nova.api.openstack.wsgi 1 0.657 > nova.db.api 10 0.388 > nova.image 1 0.179 > nova.rpc 1 0.059 > > All times are in seconds. The nova.rpc time is probably high > since this was the first call since server restart, so the > connection handshake is probably included. This is also probably > 1.5 months stale. > > The conclusion I reached from this profiling is that we just plain > overuse the db (and we might do the same in glance). For example, > whenever we do updates, we actually re-retrieve the item from the > database, update its dictionary, and save it. This is double the > cost it needs to be. We also handle updates for data across tables > inefficiently, where they could be handled in single database round > trip. > > In particular, in the case of server listings, extensions are just > rough on performance. Most extensions hit the database again > at least once. This isn't really so bad, but it clearly is an area > where we should improve, since these are the most frequent api > queries. > > I just see a ton of specific performance problems that are easier > to address one by one, rather than diving into a general (albeit > obvious) solution such as caching. > > > "Sandy Walsh" <sandy.wa...@rackspace.com > <mailto:sandy.wa...@rackspace.com>> said: > > > We're doing tests to find out where the bottlenecks are, > caching is the > > most obvious solution, but there may be others. Tools like > memcache do a > > really good job of sharing memory across servers so we don't > have to > > reinvent the wheel or hit the db at all. > > > > In addition to looking into caching technologies/approaches > we're gluing > > together some tools for finding those bottlenecks. Our first > step will > > be finding them, then squashing them ... however. > > > > -S > > > > On 03/22/2012 06:25 PM, Mark Washenberger wrote: > >> What problems are caching strategies supposed to solve? > >> > >> On the nova compute side, it seems like streamlining db > access and > >> api-view tables would solve any performance problems caching > would > >> address, while keeping the stale data management problem small. > >> > > > > _______________________________________________ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > <mailto:openstack@lists.launchpad.net> > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > > > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > <mailto:openstack@lists.launchpad.net> > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > <mailto:openstack@lists.launchpad.net> > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp