On Thu, Nov 13, 2003 at 05:39:32PM -0500, Bruce Momjian wrote:Jan Wieck wrote:
> Bruce Momjian wrote:
> > He found that write() itself didn't encourage the kernel to write the
> > buffers to disk fast enough. I think the final solution will be to use
> > fsync or O_SYNC.
> > write() alone doesn't encourage the kernel to do any physical IO at all. > As long as you have enough OS buffers, it does happy write caching until > you checkpoint and sync(), and then the system freezes.
That's not completely true. Some kernels with trickle sync, meaning
they sync a little bit regularly rather than all at once so write() does
help get those shared buffers into the kernel for possible writing. Also, it is possible the kernel will issue a sync() on its own.
So basicly on some kernels you want them to flush their dirty buffers faster.
I have a feeling we should more make it depend on the system how we ask them not to keep it in memory too long and that maybe the sync(), fsync() or O_SYNC could be a fallback in case it's needed and there are no better ways of doing it.
Maybe something as posix_fadvise() might be useful too on systems that have it?
That is all right and as said, how often, how much and how forced we do the IO can all be configurable and as flexible as people see fit. But whether you use sync(), fsync(), fdatasync(), O_SYNC, O_DSYNC or posix_fadvise(), somewhere you have to do the write(). And that write has to be coordinated with the buffer cache replacement strategy so that you write those buffers that are likely to be replaced soon, and don't write those that the strategy thinks keeping for longer anyway. Except at a checkpoint, then you have to write whatever is dirty.
The patch I posted does this write() in coordination with the strategy in a separate background process, so that the regular backends don't have to write under normal circumstances (there are some places in DDL statements that call BufferSync(), that's exceptions IMHO). Can we agree on this general outline? Or do we have any better proposals?
Jan
-- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== [EMAIL PROTECTED] #
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]