Robert Haas <robertmh...@gmail.com> wrote: > we only fsync() at end-of-checkpoint. So we'd have to think about > what to fsync, and how often, to keep the double-write buffer to a > manageable size. I think this is the big tuning challenge with this technology. > I can't help thinking that any extra fsyncs are pretty expensive, > though, especially if you have to fsync() every file that's been > double-written before clearing the buffer. Possibly we could have > 2^N separate buffers based on an N-bit hash of the relfilenode and > segment number, so that we could just fsync 1/(2^N)-th of the open > files at a time. I'm not sure I'm following -- we would just be fsyncing those files we actually wrote pages into, right? Not all segments for the table involved? > But even that sounds expensive: writing back lots of dirty data > isn't cheap. One of the systems I've been doing performance > testing on can sometimes take >15 seconds to write a shutdown > checkpoint, Consider the relation-file fsyncs for double-write as a form of checkpoint spreading, and maybe it won't seem so bad. It should make that shutdown checkpoint less painful. Now, I have been thinking that on a write-heavy system you had better have a BBU write-back cache, but that's my recommendation, anyway. > and I'm sure that other people have similar (and worse) problems. Well, I have no doubt that this feature should be optional. Those who prefer can continue to do full-page writes to the WAL, instead. Or take the "running with scissors" approach. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers