"Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes: > So Imho the target should be to have not much IO open for the checkpoint, > so the fsync is fast enough, even if serial.
The best we can do is push out dirty pages with write() via the bgwriter and hope that the kernel will see fit to write them before checkpoint time arrives. I am not sure if that hope has basis in fact or if it's just wishful thinking. Most likely, if it does have basis in fact it's because there is a standard syncer daemon forcing a sync() every thirty seconds. That means that instead of an I/O storm every checkpoint interval, we get a smaller I/O storm every 30 seconds. Not sure this is a big improvement. Jan already found out that issuing very frequent sync()s isn't a win. People keep saying that the bgwriter mustn't write pages synchronously because it'd be bad for performance, but I think that analysis is faulty. Performance of what --- the bgwriter? Nonsense, the *point* of the bgwriter is to do the slow tasks. The only argument that has any merit is that O_SYNC or immediate fsync will prevent us from having multiple writes outstanding and thus reduce the efficiency of disk write scheduling. This is a valid point but there is a limit to how many writes we need to have in flight to keep things flowing smoothly. What I'm thinking now is that the bgwriter should issue frequent fsyncs for its writes --- not immediate, but a lot more often than once per checkpoint. Perhaps take one recently-written unsynced file to fsync every time it is about to sleep. You could imagine various rules for deciding which one to sync; perhaps the one with the most writes issued against it since last sync. When we have tablespaces it'd make sense to try to distribute the syncs across tablespaces, on the assumption that the tablespaces are probably on different drives. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org