Jim Nasby wrote:
Wow, that's the kind of thing that would be incredibly difficult to figure out, 
especially while your production system is in flames... Can we change ereport 
that happens in that case from DEBUG1 to WARNING? Or provide some other means 
to track it

That's why we already added pg_stat_bgwriter.buffers_backend_fsync to track the problem before trying to improve it. It was driving me crazy on a production server not having any visibility into when it happened. I haven't seen that we need anything beyond that so far. In the context of this new patch for example, if you get to where a backend does its own sync, you'll know it did a compaction as part of that. The existing statistic would tell you enough.

There's now enough data in test set 3 at http://www.2ndquadrant.us/pgbench-results/index.htm to start to see how this breaks down on a moderately big system (well, by most people's standards, but not Jim for whom this is still a toy). Note the backend_sync column on the right, very end of the page; that's the relevant counter I'm commenting on:

scale=175:  Some backend fsync with 64 clients, 2/3 runs.
scale=250:  Significant backend fsync with 32 and 64 clients, every run.
scale=500: Moderate to large backend fsync at any client count >=16. This seems to be worst spot of those mapped. Above here, I would guess the TPS numbers start slowing enough that the fsync request queue activity drops, too.
scale=1000:  Backend fsync starting at 8 clients
scale=2000: Backend fsync starting at 16 clients. By here I think the TPS volumes are getting low enough that clients are stuck significantly more often waiting for seeks rather than fsync.

Looks like the most effective spot for me to focus testing on with this server is scales of 500 and 1000, with 16 to 64 clients. Now that I've got the scale fine tuned better, I may crank up the client counts too and see what that does. I'm glad these are appearing in reasonable volume here though, was starting to get nervous about only having NDA restricted results to work against. Some days you just have to cough up for your own hardware.

I just tagged pgbench-tools-0.6.0 and pushed to GitHub/git.postgresql.org with the changes that track and report on buffers_backend_fsync if anyone else wants to try this out. It includes those numbers if you have a 9.1 with them, otherwise just reports 0 for it all the time; detection of the feature wasn't hard to add. The end portion of a config file for the program (the first part specifies host/username info and the like) that would replicate the third test set here is:

MAX_WORKERS="4"
SCRIPT="tpc-b.sql"
SCALES="1 10 100 175 250 500 1000 2000"
SETCLIENTS="4 8 16 32 64"
SETTIMES=3
RUNTIME=600
TOTTRANS=""

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to