Tom Lane wrote:
I wonder whether it'd be useful to keep track of the total amount of data written-and-not-yet-synced, and to issue fsyncs often enough to keep that below some parameter; the idea being that the parameter would limit how much dirty kernel disk cache there is. Of course, ideally the kernel would have a similar tunable and this would be a waste of effort on our part...
I wanted to run the tests again before reporting in detail here, because the results are so bad, but I threw out an initial report about trying to push this toward this down to be the kernel's job at http://blog.2ndquadrant.com/en/2011/01/tuning-linux-for-low-postgresq.html
So far it looks like the newish Linux dirty_bytes parameter works well at reducing latency by limiting how much dirty data can pile up before it gets nudged heavily toward disk. But the throughput drop you pay on VACUUM in particular is brutal, I'm seeing over a 50% slowdown in some cases. I suspect we need to let the regular cleaner and backend writes queue up in the largest possible cache for VACUUM, so it benefits as much as possible from elevator sorting of writes. I suspect this being the worst case now for a tightly controlled write cache is an unintended side-effect of the ring buffer implementation it uses now.
Right now I'm running the same tests on XFS instead of ext3, and those are just way more sensible all around; I'll revisit this on that filesystem and ext4. The scale=500 tests I've running lots of lately are a full 3X TPS faster on XFS relative to ext3, with about 1/8 as much worst-case latency.
-- Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers