On 10/27/2014 03:21 PM, Tomas Vondra wrote:
Dne 27 Říjen 2014, 10:47, Heikki Linnakangas napsal(a):
On 10/26/2014 11:47 PM, Tomas Vondra wrote:
After eyeballing the code for an hour or two, I think CREATE DATABASE
should be fine with performing only a 'partial checkpoint' on the
template database - calling FlushDatabaseBuffers and processing unlink
requests, as suggested by the comment in createdb().
Hmm. You could replace the first checkpoint with that, but I don't think
that's enough for the second. To get any significant performance
benefit, you need to get rid of both checkpoints, because doing two
checkpoints one after another is almost as fast as doing a single
checkpoint; the second checkpoint has very little work to do because the
first checkpoint already flushed out everything.
The second checkpoint, after copying but before commit, is done because
(from the comments in createdb function):
* #1: When PITR is off, we don't XLOG the contents of newly created
* indexes; therefore the drop-and-recreate-whole-directory behavior
* of DBASE_CREATE replay would lose such indexes.
* #2: Since we have to recopy the source database during DBASE_CREATE
* replay, we run the risk of copying changes in it that were
* committed after the original CREATE DATABASE command but before the
* system crash that led to the replay. This is at least unexpected
* and at worst could lead to inconsistencies, eg duplicate table
* names.
Doing only FlushDatabaseBuffers would not prevent these issues - you
need a full checkpoint. These issues are better explained here:
http://www.postgresql.org/message-id/28884.1119727...@sss.pgh.pa.us
Thinking about this a bit more, do we really need a full checkpoint? That
is a checkpoint of all the databases in the cluster? Why checkpointing the
source database is not enough?
I mean, when we use database A as a template, why do we need to checkpoint
B, C, D and F too? (Apologies if this is somehow obvious, I'm way out of
my comfort zone in this part of the code.)
A full checkpoint ensures that you always begin recovery *after* the
DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
crash recovery (except when you crash before the transaction commits, in
which case it doesn't matter if the new database's directory is borked).
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers