Re: [HACKERS] unlogged tables

Robert Haas Fri, 17 Dec 2010 21:22:09 -0800

Here's an attempt to summarize the remaining issues with this patch
that I know about.  I may have forgotten something, so please mention
it if you notice something missing.


1. pg_dump needs an option to control whether unlogged tables are
dumped.  --no-unlogged-tables seems like the obvious choice, assuming
we want the default to be to dump them, which seems like the safest
option.

2. storage.sgml likely needs to be updated.  We have a section on the
free space map and one on the visibility map, so I suppose the logical
thing to do is add a similar section on the initialization fork.

3. It's unnecessary to include unlogged relation buffers in
non-shutdown checkpoints.  I've recently realized that this is true
independently of whether or not we want unlogged tables to survive a
clean shutdown.  Whether or not we can survive a clean shutdown is a
function of whether we register dirty segments when buffers are
written, which is independent of whether we choose to write such
buffers as part of a checkpoint.  And indeed, unless we're about to
shut down, there's no reason to do so, because the whole point of
checkpointing is to advance the redo pointer, and that's irrelevant
for unlogged tables.

4. It's arguably unnecessary to register dirty segments for unlogged
relations.  Given #3, this now seems a little less important.  If the
unlogged relation is hot and fits in shared_buffers, then omitting it
from the checkpoint process means we'll never write out those dirty
buffers, so the fact that they'd cause fsyncs if we did write them
doesn't matter.  However, it's still not totally irrelevant, because a
relation that fits in the OS buffer cache but not in shared buffers
will probably generate fsyncs at every checkpoint.  (And on the third
hand, the OS may decide to write the dirty data anyway, especially if
it's a largish percentage of RAM.)  There are a couple of possible
ways of dealing with this:

4A. The solution Andres proposed - Iterate through all unlogged
relations at shutdown time and fsyncing them all.  Possibly
complicated to handle fsync failures.
4B. Another idea I just thought of - register dirty segments as
normal, but teach the background writer to accumulate them in a
separate queue that is only flushed at shutdown, or when it reaches
some maximum size, rather than at every checkpoint.
4C. Decree that this is an area for future enhancement and forget
about it for now.  I am leaning toward this option.

5. Make it work with GIST indexes.  Per discussion on the other
thread, the current proposal seems to be: (a) add a BM_FLUSH_XLOG bit;
when clear, don't flush XLOG; this then allows pages to have fake
LSNs; (b) add an XLogRecPtr structure in shared memory, protected by a
spinlock; (c) use the structure described in (b) to generate fake LSNs
every time an operation is performed on an unlogged GIST index.  I am
not clear on how we make this work across shutdowns - it seems you'd
need to save this structure somewhere during a clean shutdown (where?)
and restore it on startup, unless we go back to truncating even on a
clean shutdown.

6. Make it work with GIN indexes.  I haven't looked at what's involved here yet.

Advice, comments, feedback appreciated...  I'd like to put this one to bed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

Reply via email to