On Tue, Feb 1, 2011 at 12:58 PM, Kevin Grittner <kevin.gritt...@wicourts.gov> wrote: > Robert Haas <robertmh...@gmail.com> wrote: > >> I also think Bruce's idea of calling fsync() on each relation just >> *before* we start writing the pages from that relation might have >> some merit. > > What bothers me about that is that you may have a lot of the same > dirty pages in the OS cache as the PostgreSQL cache, and you've just > ensured that the OS will write those *twice*. I'm pretty sure that > the reason the aggressive background writer settings we use have not > caused any noticeable increase in OS disk writes is that many > PostgreSQL writes of the same buffer keep an OS buffer page from > becoming stale enough to get flushed until PostgreSQL writes to it > taper off. Calling fsync() right before doing "one last push" of > the data could be really pessimal for some workloads.
I was thinking about what Greg reported here: http://archives.postgresql.org/pgsql-hackers/2010-11/msg01387.php If the amount of pre-checkpoint dirty data is 3GB and the checkpoint is writing 250MB, then you shouldn't have all that many extra writes... but you might have some, and that might be enough to send the whole thing down the tubes. InnoDB apparently handles this problem by advancing the redo pointer in small steps instead of in large jumps. AIUI, in addition to tracking the LSN of each page, they also track the first-dirtied LSN. That lets you checkpoint to an arbitrary LSN by flushing just the pages with an older first-dirtied LSN. So instead of doing a checkpoint every hour, you might do a mini-checkpoint every 10 minutes. Since the mini-checkpoints each need to flush less data, they should be less disruptive than a full checkpoint. But that, too, will generate some extra writes. Basically, any idea that involves calling fsync() more often is going to tend to smooth out the I/O load at the cost of some increase in the total number of writes. If we don't want any increase at all in the number of writes, spreading out the fsync() calls is pretty much the only other option. I'm worried that even with good tuning that won't be enough to tamp down the latency spikes. But maybe it will be... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers