Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> You want to find, open, and fsync() every file in the database cluster >> for every checkpoint? Sounds like a non-starter to me.
> Except a) this is outside any critical path, and b) only done every few > minutes and c) the fsync calls on files with no dirty buffers ought to be > cheap, at least as far as i/o. The directory search and opening of the files is in itself nontrivial overhead ... particularly on systems where open(2) isn't speedy, such as Solaris. I also disbelieve your assumption that fsync'ing a file that doesn't need it will be free. That depends entirely on what sort of indexes the OS keeps on its buffer cache. There are Unixen where fsync requires a scan through the entire buffer cache because there is no data structure that permits finding associated buffers any more efficiently than that. (IIRC, the HPUX system I'm typing this on is like that.) On those sorts of systems, we'd be way better off to use O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs. Check the archives --- this was all gone into in great detail when we were testing alternative methods for fsyncing the WAL files. > So the NetBSD and Sun developers I checked with both asserted fsync does in > fact guarantee this. And SUSv2 seems to back them up: > The fsync() function can be used by an application to indicate that all > data for the open file description named by fildes is to be transferred to > the storage device associated with the file described by fildes in an > implementation-dependent manner. The question here is what is meant by "data for the open file description". If it said "all data for the file referenced by the open FD" then I would agree that the spec says what you claim. As is, I think it would be entirely within the spec for the OS to dump only buffers that had been dirtied through that particular FD. Notice that the last part of the sentence is careful to respect the distinction between the FD and the file; why isn't the first part? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly