Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

MauMau Sat, 08 Jun 2013 17:34:01 -0700

From: "Josh Berkus" <j...@agliodbs.com>

There's actually three potential failure cases here:


- One Volume: WAL is on the same volume as PGDATA, and that volume is
completely out of space.

- XLog Partition: WAL is on its own partition/volume, and fills it up.

- Archiving: archiving is failing or too slow, causing the disk to fill
up with waiting log segments.


I think there is one more case.  Is this correct?

- Failure of a disk containing data directory or tablespace

If checkpoint can't write buffers to disk because of disk failure,checkpoint cannot complete, thus WAL files accumulate in pg_xlog/.

This means that one disk failure will lead to postgres shutdown.

xLog Partition
--------------

As Heikki pointed, out, a full dedicated WAL drive is hard to fix once
it gets full, since there's nothing you can safely delete to clear
space, even enough for a checkpoint record.

This sounds very scary. Is it possible to complete recovery and start uppostmaster with either or both of the following modifications?


[Idea 1]

During recovery, force archiving a WAL file and delete/recycle it inpg_xlog/ as soon as its contents are applied.


[Idea 2]

During recovery, when disk full is encountered at end-of-recoverycheckpoint, force archiving all unarchived WAL files and delete/recycle themat that time.



Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

Reply via email to