Switching to a new thread for this summary since there's some much more generic info here...at this point I've finished exploring the major Linux filesystem and tuning options I wanted to, as part of examining changes to the checkpoint code. You can find all the raw data at http://www.2ndquadrant.us/pgbench-results/index.htm Here are some highlights of what's been demonstrated there recently, with a summary of some of the more subtle and interesting data in the attached CSV file too:

-On ext3, tuning the newish kernel tunables dirty_bytes and dirty_background_bytes down to a lower level than was possible using the older dirty_*ratio ones shows a significant reduction in maximum latency on ext3; it drops to about 1/4 of the worst-case behavior. Unfortunately transactions per second takes a 10-15% hit in the process. Not shown in the data there is that the VACUUM cleanup time between tests was really slowed down, too, running at around half the speed of when the system has a full-size write cache.

-Switching from ext3 to xfs gave over a 3X speedup on the smaller test set: from the 600-700 TPS range to around 2200 TPS. TPS rate on the larger data set actually slowed down a touch on XFS, around 10%. Still, such a huge win when it's better makes it easy to excuse the occasional cases where it's a bit slower. And the latency situation is just wildly better, the main thing that drove me toward using XFS more in the first place. Anywhere from 1/6 to 1/25 of the worst-case latency seen on ext3. With abusively high client counts for this hardware, you can still see >10 second pauses, but you don't see >40 second ones at moderate client counts like ext3 experiences.

-Switching to the lower possible dirty_*bytes parameters on XFS was negative in every way. TPS was cut in half, and maximum latency actually went up. Between this and the nasty VACUUM slowdown, I don't really see that much potential for these new tunables. They do lower latency on ext3 a lot, but even there the penalty you pay for that is quite high. VACUUM in particular seems to really, really benefit from having a giant write cache to dump its work into--possibly due to the way the ring buffer implementation avoids using the database's own cache for that work.

-Since earlier tests suggested sorting checkpoints gave little change on ext3, I started testing that with XFS instead. The result is a bit messy. At the lower scale, TPS went up a bit, but so did maximum latency. At the higher scale, TPS dropped in some cases (typically less than 1%), but most latency results were better too.

At this point I would say checkpoint sorting remains a wash: you can find workloads it benefits a little, and others it penalizes a little. I would say that it's neutral enough on average that if it makes sense to include for other purposes, that's unlikely to be a really bad change for anyone. But I wouldn't want to see it committed by itself; there needs to be some additional benefit from the sorting before it's really worthwhile.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

"Compact fsync",,"ext3",,,"XFS + Regular Writes",,"Sorted Writes",,,,
"scale","clients","tps","max_latency","XFS Speedup","tps","max_latency","tps","max_latency","TPS Delta","%","Latency Delta"
500,16,631,17116.31,3.49,2201,1290.73,2210,2070.74,9,0.41%,780.01
500,32,655,24311.54,3.37,2205,1379.14,2357,1971.2,152,6.89%,592.06
500,64,727,38040.39,3.11,2263,1440.48,2332,1763.29,69,3.05%,322.81
500,128,687,48195.77,3.2,2201,1743.11,2221,2742.18,20,0.91%,999.07
500,256,747,46799.48,2.92,2184,2429.74,2171,2356.14,-13,-0.60%,-73.6
1000,16,321,40826.58,1.21,389,1586.17,386,1598.54,-3,-0.77%,12.37
1000,32,345,27910.51,0.91,314,2150.94,331,2078.02,17,5.41%,-72.91
1000,64,358,45138.1,0.94,336,6681.57,320,6469.71,-16,-4.76%,-211.87
1000,128,372,47125.46,0.88,328,8707.42,330,9037.63,2,0.61%,330.21
1000,256,350,83232.14,0.91,317,11973.35,315,11248.18,-2,-0.63%,-725.17
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to