The documentation says of continuous archiving: "While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_xlog/ directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)"
I think that it is not uncommon for archiving to fall seriously behind, risking a serious loss of availability. When this happens, the DBA has to fight against the clock to fix whatever problem there is with continuous archiving, hoping to catch up and prevent a PANIC shutdown. This is a particularly unpleasant problem to have. At Heroku, we naturally monitor the state of continuous archiving on all clusters under our control. However, when faced with this situation, sometimes the least-worst option to buy time is to throttle Postgres using a crude mechanism: issuing repeated SIGSTOP and SIGCONT signals to all Postgres processes, with the exception of the archiver auxiliary process. Obviously this is a terrible thing to have to do, principally because it slows almost everything right down. It would be far preferable to just slow down the writing of WAL segments when these emergencies arise, since that alone is what risks causing a PANIC shutdown when XLogWrite() cannot write WAL. Even if the pg_xlog directory is on the same filesystem as database heap files, it is obviously the case that throttling WAL will have the effect of throttling operations that might cause those heap files to be enlarged. Reads (including the writes that enable reads, like those performed by the background writer and backends to clean dirty buffers) and checkpointing are not affected (though of course checkpointing does have to write checkpoint WAL records, so perhaps not quite). What I'd like to propose is that we simply sit on WALWriteLock for a configured delay in order to throttle the writing (though not the insertion) of WAL records. I've drafted a patch that does just that - it has the WAL Writer optionally sleep on the WALWriteLock for some period of time once per activity cycle (avoiding WAL Writer hibernation). If this sounds similar to commit_delay, that's because it is almost exactly the same. We just sleep within the WAL Writer rather than a group commit leader backend because that doesn't depend upon some backend hitting the XLogFlush()/commit_delay codepath. In a bulk loading situation, it's perfectly possible for no backend to actually hit XLogFlush() with any sort of regularity, so commit_delay cannot really be abused to do what I describe here. Besides, right now commit_delay is capped so that it isn't possible to delay for more than 1/10th of a second. What I've proposed here has the disadvantage of making activity rounds of the WAL Writer take longer, thus considerably increasing the window in which any asynchronous commits will actually make it out to disk. However, that's a problem that's inherent with any throttling of WAL Writing as described here (XLogBackgroundFlush() itself acquires WalWriteLock anyway), so I don't imagine that there's anything that can be done about that other than having a clear warning. I envisage this feature as very much a sharp tool to be used by the DBA only when they are in a very tight bind. Better to at least be able to handle read queries in the event of having this problem, and to not throttle longer running transactions with some writes that don't need to make it out to disk right away. I also have a notion that we can usefully throttle WAL writing less aggressively than almost or entirely preventing it. I have an idea that a third party monitoring daemon could scale up or down the throttling delay as a function of how full the pg_xlog filesystem is. It might be better to modestly throttle WAL writing for two hours in order to allow continuous archiving to catch up, rather than sharply curtailing WAL writing for a shorter period. Has anyone else thought about approaches to mitigating the problems that arise when an archive_command continually fails, and the DBA must manually clean up the mess? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers