On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnakan...@vmware.com> writes: > > On 01/15/2014 07:50 AM, Dave Chinner wrote: > >> FWIW [and I know you're probably sick of hearing this by now], but > >> the blk-io throttling works almost perfectly with applications that > >> use direct IO..... > > > For checkpoint writes, direct I/O actually would be reasonable. > > Bypassing the OS cache is a good thing in that case - we don't want the > > written pages to evict other pages from the OS cache, as we already have > > them in the PostgreSQL buffer cache. > > But in exchange for that, we'd have to deal with selecting an order to > write pages that's appropriate depending on the filesystem layout, > other things happening in the system, etc etc. We don't want to build > an I/O scheduler, IMO, but we'd have to. > > > Writing one page at a time with O_DIRECT from a single process might be > > quite slow, so we'd probably need to use writev() or asynchronous I/O to > > work around that. > > Yeah, and if the system has multiple spindles, we'd need to be issuing > multiple O_DIRECT writes concurrently, no? > writev effectively does do that, doesn't it? But they do have to be on the same file handle, so that could be a problem. I think we need something like sorted checkpoints sooner or later, anyway. > What we'd really like for checkpointing is to hand the kernel a boatload > (several GB) of dirty pages and say "how about you push all this to disk > over the next few minutes, in whatever way seems optimal given the storage > hardware and system situation. Let us know when you're done." And most importantly, "Also, please don't freeze up everything else in the process" Cheers, Jeff