On Tue, Jun 25, 2013 at 1:15 PM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > I'm not sure it's a good idea to sleep proportionally to the time it took to > complete the previous fsync. If you have a 1GB cache in the RAID controller, > fsyncing the a 1GB segment will fill it up. But since it fits in cache, it > will return immediately. So we proceed fsyncing other files, until the cache > is full and the fsync blocks. But once we fill up the cache, it's likely > that we're hurting concurrent queries. ISTM it would be better to stay under > that threshold, keeping the I/O system busy, but never fill up the cache > completely.
Isn't the behavior implemented by the patch a reasonable approximation of just that? When the fsyncs start to get slow, that's when we start to sleep. I'll grant that it would be better to sleep when the fsyncs are *about* to get slow, rather than when they actually have become slow, but we have no way to know that. The only feedback we have on how bad things are is how long it took the last fsync to complete, so I actually think that's a much better way to go than any fixed sleep - which will often be unnecessarily long on a well-behaved system, and which will often be far too short on one that's having trouble. I'm inclined to think think Kondo-san has got it right. I like your idea of putting a stake in the ground and assuming that the fsync phase will turn out to be X% of the checkpoint, but I wonder if we can be a bit more sophisticated, especially for cases where checkpoint_segments is small. When checkpoint_segments is large, then we know that some of the data will get written back to disk during the write phase, because the OS cache is only so big. But when it's small, the OS will essentially do nothing during the write phase, and then it's got to write all the data out during the fsync phase. I'm not sure we can really model that effect thoroughly, but even something dumb would be smarter than what we have now - e.g. use 10%, but when checkpoint_segments < 10, use 1/checkpoint_segments. Or just assume the fsync phase will take 30 seconds. Or ... something. I'm not really sure what the right model is here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers