On Fri, Sep 13, 2013 at 4:04 PM, Kevin Grittner <kgri...@ymail.com> wrote: > Andres Freund <and...@2ndquadrant.com> wrote: > >> Absolutely not claiming the contrary. I think it sucks that we >> couldn't fully figure out what's happening in detail. I'd love to >> get my hand on a setup where it can be reliably reproduced. > > I have seen two completely different causes for symptoms like this, > and I suspect that these aren't the only two. > > (1) The dirty page avalanche: PostgreSQL hangs on to a large > number of dirty buffers and then dumps a lot of them at once. The > OS does the same. When PostgreSQL dumps its buffers to the OS it > pushes the OS over a "tipping point" where it is writing dirty > buffers too fast for the controller's BBU cache to absorb them. > Everything freezes until the controller writes and accepts OS > writes for a lot of data. This can take several minutes, during > which time the database seems "frozen". Cure is some combination > of these: reduce shared_buffers, make the background writer more > aggressive, checkpoint more often, make the OS dirty page writing > more aggressive, add more BBU RAM to the controller.
Yeah -- I've seen this too, and it's a well understood problem. Getting o/s to spin dirty pages out faster is the name of the game I think. Storage is getting so fast that it's (mostly) moot anyways. Also, this is under the umbrella of 'high i/o' -- the stuff I've been seeing is low- or no- I/o. > (2) Transparent huge page support goes haywire on its defrag work. > Clues on this include very high "system" CPU time during an > episode, and `perf top` shows more time in kernel spinlock > functions than anywhere else. The database doesn't completely lock > up like with the dirty page avalanche, but it is slow enough that > users often describe it that way. So far I have only seen this > cured by disabling THP support (in spite of some people urging that > just the defrag be disabled). It does make me wonder whether there > is something we could do in PostgreSQL to interact better with > THPs. Ah, that's a useful tip; need to research that, thanks. Maybe Josh might be able to give it a whirl... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers