On 29 May 2012 17:58, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <pe...@2ndquadrant.com> > wrote: >> Why do you think that doing this for all XLogFlush() callsites might >> be problematic? > > Well, consider the one in the background writer, for example. That's > just a periodic flush, so I see no benefit in having it acquire the > lock and then wait some more. It already did wait.
No benefit for the BGWriter, no, but a benefit for the cluster as a whole, which can see not only an improvement in transaction throughput, but also of average and worst-case latency. The whole idea of a group commit feature is that one backend altruistically takes other backends with it. Besides, for the background writer or checkpointer, that operate in terms of minutes, seconds and milliseconds, having their transaction block an extra 2 milliseconds is hardly a real problem. Both the BGWriter and checkpointer do an amount of work that is adaptive per cycle anyway. > And what about the case where we're flushing while holding WALInsertLock > because the > buffer's full? Clearly waiting is useless in that case - nobody can > join the group commit for exactly the same reason that we're doing the > flush in the first place: no buffer space. Maybe you could have the leader check that condition, and not wait accordingly. If the buffer is full, chances are very high that there already is a leader, either sleeping or following through with an XLogWrite(), whose work is well underway. The general solution that you've proposed - adding an actually_wait bool parameter to XLogFlush() - won't work, unless, for example, the "buffer is full" call happens to become the leader, which is generally quite unlikely. Otherwise it's stuck waiting behind WALWriteLock along with everyone else, on average for a duration of half (CommitDelay + avg flushtime), while the leader sleeps and then works. If something is only effective a small minority of the time, why bother? I doubt that you can reasonably have the BGWriter or whatever cut in line ahead of an ever-growing queue of backends calling XLogFlush(), if that's what you're thinking - now all those backends have to wait an additional period of up to however long the flush takes. In any case, the standard for changing the exact behaviour of commit_delay/commit_siblings ought to be quite low, because those settings are already so heavily laden with problems that it is somewhat doubtful that they've ever been used by someone in production sensibly. Obviously having a delay that turns out to be in some way suboptimal is obviously something I'd hope to avoid entirely, but I've done a good job of considerably improving that situation. If we get this into 9.2, perhaps a more adaptive implementation can follow in 9.3. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers