On Thu, Apr 14, 2011 at 2:08 AM, Greg Smith <g...@2ndquadrant.com> wrote: > This week several list regulars here waded into the MySQL Convention. I > decided to revisit PostgreSQL vs. MySQL performance using the sysbench > program as part of that. It's not important to what I'm going to describe > to understand exactly what statements sysbench runs here or how to use it, > but if anyone is curious I've got some more details about how I ran the > tests in my talk slides at http://projects.2ndquadrant.com/talks The > program has recently gone through some fixes that make it run a bit better > both in general and against PostgreSQL. The write tests are still broken > against PostgreSQL, but it now seems to do a reasonable job simulating a > simple SELECT-only workload. A fix from Jignesh recently made its way into > the database generation side of the code that makes it less tedious to test > with it too. > The interesting part was how per-client scaling compared between the two > databases; graph attached. On my 8 core server, PostgreSQL scales nicely up > to a steady 50K TPS. I see the same curve, almost identical numbers, with > PostgreSQL and pgbench--no reason to suspect sysbench is doing anything > shady. The version of MySQL I used hits around 67K TPS with innodb when > busy with lots of clients. That part doesn't bother me; nobody expects > PostgreSQL to be faster on trivial SELECT statements and the gap isn't that > big. > > The shocking part was the single client results. I'm using to seeing > Postgres get around 7K TPS per core on those, which was the case here, and I > never considered that an interesting limitation to think about before. > MySQL turns out to hit 38K TPS doing the same work. Now that's a gap > interesting enough to make me wonder what's going on. > > Easy enough to exercise the same sort of single client test case with > pgbench and put it under a profiler: > > sudo opcontrol --init > sudo opcontrol --setup --no-vmlinux > createdb pgbench > pgbench -i -s 10 pgbench > psql -d pgbench -c "vacuum" > sudo opcontrol --start > sudo opcontrol --reset > pgbench -S -n -c 1 -T 60 pgbench > sudo opcontrol --dump ; sudo opcontrol --shutdown > opreport -l image:$HOME/pgwork/inst/test/bin/postgres > > Here's the top calls, from my laptop rather than the server that I generated > the graph against. It does around 5.5K TPS with 1 clients and 10K with 2 > clients, so same basic scaling: > > samples % image name symbol name > 53548 6.7609 postgres AllocSetAlloc > 32787 4.1396 postgres MemoryContextAllocZeroAligned > 26330 3.3244 postgres base_yyparse > 21723 2.7427 postgres hash_search_with_hash_value > 20831 2.6301 postgres SearchCatCache > 19094 2.4108 postgres hash_seq_search > 18402 2.3234 postgres hash_any > 15975 2.0170 postgres AllocSetFreeIndex > 14205 1.7935 postgres _bt_compare > 13370 1.6881 postgres core_yylex > 10455 1.3200 postgres MemoryContextAlloc > 10330 1.3042 postgres LockAcquireExtended > 10197 1.2875 postgres ScanKeywordLookup > 9312 1.1757 postgres MemoryContextAllocZero > > I don't know nearly enough about the memory allocator to comment on whether > it's possible to optimize it better for this case to relieve any bottleneck. > Might just get a small gain then push the limiter to the parser or hash > functions. I was surprised to find that's where so much of the time was > going though. > > P.S. When showing this graph in my talk, I pointed out that anyone who is > making decisions about which database to use based on trivial SELECTs on > small databases isn't going to be choosing between PostgreSQL and MySQL > anyway--they'll be deploying something like MongoDB instead if that's the > important metric.
on my workstation VM, I get: 6.7k selects single client, 12k selects piped through single user backend, 13.5k piped through single user backend, one transaction 23k in plpgsql 'execute' in loop (which is really two queries, one to build the query and one to execute), 100k in non dynamic query plpgsql in loop. in addition to parsing and planning, the network and the protocol really factor in. this is why i'm so keen on being able to inject queries directly in to the backend via stored procedure. I'm also really curious how a plpgsql-ish language performs when managing transactions itself. libpqtypes was written so you could stage data on the client and hand it off to the backend and act on it in plpgsl world. it works really well -- you can amortize the turnaround losses through the newtork over a lot of data. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers