Let's talk failure cases. There's actually three potential failure cases here:
- One Volume: WAL is on the same volume as PGDATA, and that volume is completely out of space. - XLog Partition: WAL is on its own partition/volume, and fills it up. - Archiving: archiving is failing or too slow, causing the disk to fill up with waiting log segments. I'll argue that these three cases need to be dealt with in three different ways, and no single solution is going to work for all three. Archiving --------- In some ways, this is the simplest case. Really, we just need a way to know when the available WAL space has become 90% full, and abort archiving at that stage. Once we stop attempting to archive, we can clean up the unneeded log segments. What we need is a better way for the DBA to find out that archiving is falling behind when it first starts to fall behind. Tailing the log and examining the rather cryptic error messages we give out isn't very effective. xLog Partition -------------- As Heikki pointed, out, a full dedicated WAL drive is hard to fix once it gets full, since there's nothing you can safely delete to clear space, even enough for a checkpoint record. On the other hand, it should be easy to prevent full status; we could simply force a non-spread checkpoint whenever the available WAL space gets 90% full. We'd also probably want to be prepared to switch to a read-only mode if we get full enough that there's only room for the checkpoint records. One Volume ---------- This is the most complicated case, because we wouldn't necessarily run out of space because of WAL using it up. Anything could cause us to run out of disk space, including activity logs, swapping, pgsql_tmp files, database growth, or some other process which writes files. This means that the DBA getting out of disk-full manually is in some ways easier; there's usually stuff she can delete. However, it's much harder -- maybe impossible -- for PostgreSQL to prevent this kind of space outage. There should be things we can do to make it easier for the DBA to troubleshoot this, but I'm not sure what. We could use a hard limit for WAL to prevent WAL from contributing to out-of-space, but that'll only prevent a minority of cases. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers