On 1/17/14 10:37 AM, Mel Gorman wrote:
There is not an easy way to tell. To be 100%, it would require an instrumentation patch or a systemtap script to detect when a particular page is being written back and track the context. There are approximations though. Monitor nr_dirty pages over time.

I have a benchmarking wrapper for the pgbench testing program called pgbench-tools: https://github.com/gregs1104/pgbench-tools As of October, on Linux it now plots the "Dirty" value from /proc/meminfo over time. You get that on the same time axis as the transaction latency data. The report at the end includes things like the maximum amount of dirty memory observed during the test sampling. That doesn't tell you exactly what's happening to the level someone reworking the kernel logic might want, but you can easily see things like the database's checkpoint cycle reflected by watching the dirty memory total. This works really well for monitoring production servers too. I have a lot of data from a plugin for the Munin monitoring system that plots the same way. Once you have some history about what's normal, it's easy to see when systems fall behind in a way that's ruining writes, and the high water mark often correlates with bad responsiveness periods.

Another recent change is that pgbench for the upcoming PostgreSQL 9.4 now allows you to specify a target transaction rate. Seeing the write latency behavior with that in place is far more interesting than anything we were able to watch with pgbench before. The pgbench write tests we've been doing for years mainly told you the throughput rate when all of the caches were always as full as the database could make them, and tuning for that is not very useful. Turns out it's far more interesting to run at 50% of what the storage is capable of, then watch what happens to latency when you adjust things like the dirty_* parameters.

I've been working on the problem of how we can make a benchmark test case that acts enough like real busy PostgreSQL servers that we can share it with kernel developers, and then everyone has an objective way to measure changes. These rate limited tests are working much better for that than anything I came up with before.

I am skeptical that the database will take over very much of this work and perform better than the Linux kernel does. My take is that our most useful role would be providing test cases kernel developers can add to a performance regression suite. Ugly "we never though that would happen" situations seems at the root of many of the kernel performance regressions people here get nailed by.

Effective I/O scheduling is very hard, and we are unlikely to ever out innovate the kernel hacking community by pulling more of that into the database. It's already possible to experiment with moving in that direction with tuning changes. Use a larger database shared_buffers value, tweak checkpoints to spread I/O out, and reduce things like dirty_ratio. I do some of that, but I've learned it's dangerous to wander too far that way.

If instead you let Linux do even more work--give it a lot of memory to manage and room to re-order I/O--that can work out quite well. For example, I've seen a lot of people try to keep latency down by using the deadline scheduler and very low settings for the expire times. Theory is great, but it never works out in the real world for me though. Here's the sort of deadline I deploy instead now:

    echo 500      > ${DEV}/queue/iosched/read_expire
    echo 300000   > ${DEV}/queue/iosched/write_expire
    echo 1048576  > ${DEV}/queue/iosched/writes_starved

These numbers look insane compared to the defaults, but I assure you they're from a server that's happily chugging through 5 to 10K transactions/second around the clock. PostgreSQL forces writes out with fsync when they must go out, but this sort of tuning is basically giving up on it managing writes beyond that. We really have no idea what order they should go out in. I just let the kernel have a large pile of work queued up, and trust things like the kernel's block elevator and congestion code are smarter than the database can possibly be.

--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to