On Mar 12, 2007, at 8:08 PM, Tom Lane wrote:
Randy Isbell <[EMAIL PROTECTED]> writes:
One thing I do find interesting: while the backup created with my
"online" / Chap 23.3 processing contains 18 WAL files, only the last
3 (most recent) are used during the recovery process. Is that
normal?
Hmm, no, that seems pretty suspicious. The thing *should* be starting
from the checkpoint created by pg_start_backup(), but that observation
makes it sound like it is starting from some later one, probably
whatever happened to be current when pg_control got archived. Please
check this by noting what the startup messages say about where replay
commences, and compare to the backup label file (plain text, you can
just look at it) and pg_control (use pg_controldata before starting
the
replay).
Here is the pg_controldata output for the saved cluster prior to
startup:
pg_control version number: 822
Catalog version number: 200611241
Database system identifier: 5038266934557908351
Database cluster state: in production
pg_control last modified: Tue Mar 6 12:43:53 2007
Current log file ID: 2
Next log file segment: 6
Latest checkpoint location: 2/3035F48
Prior checkpoint location: 2/292EC
Latest checkpoint's REDO location: 2/3000840
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/18823426
Latest checkpoint's NextOID: 8774996
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Tue Mar 6 12:41:41 2007
Minimum recovery ending location: 0/0
Maximum data alignment: 4
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Date/time type storage: 64-bit integers
Maximum length of locale name: 128
LC_COLLATE: C
LC_CTYPE: C
Here is the backup label file:
# cat 0000000100000001000000F3.00FB4C3C.backup
START WAL LOCATION: 1/F3FB4C3C (file 0000000100000001000000F3)
STOP WAL LOCATION: 2/56912D8 (file 000000010000000200000005)
CHECKPOINT LOCATION: 1/F3FB4C3C
START TIME: 2007-03-06 12:25:05 CST
LABEL: /usr/local/bck/db/sn200703061224.tar.gz
STOP TIME: 2007-03-06 12:44:27 CST
Here are the startup messages:
<2007-03-12 10:39:10 CDT 82475>LOG: could not create socket for
statistics collector: Protocol not supported
<2007-03-12 10:39:10 CDT 82475>LOG: trying another address for the
statistics collector
<2007-03-12 10:39:10 CDT 82889>LOG: database system was interrupted
at 2007-03-06 12:43:53 CST
<2007-03-12 10:39:10 CDT 82889>LOG: starting archive recovery
<2007-03-12 10:39:10 CDT 82889>LOG: restore_command = "cp /usr/local/
pgsql/walarch/%f %p"
cp: /usr/local/pgsql/walarch/00000001.history: No such file or directory
<2007-03-12 10:39:10 CDT 82889>LOG: restored log file
"000000010000000200000003" from archive
<2007-03-12 10:39:10 CDT 82889>LOG: checkpoint record is at 2/3035F48
<2007-03-12 10:39:10 CDT 82889>LOG: redo record is at 2/3000840;
undo record is at 0/0; shutdown FALSE
<2007-03-12 10:39:10 CDT 82889>LOG: next transaction ID: 0/18823426;
next OID: 8774996
<2007-03-12 10:39:10 CDT 82889>LOG: next MultiXactId: 1; next
MultiXactOffset: 0
<2007-03-12 10:39:10 CDT 82889>LOG: automatic recovery in progress
<2007-03-12 10:39:10 CDT 82889>LOG: redo starts at 2/3000840
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3000840; LSN 2/30024C0:
prev 2/3000818; xid 18823427; bkpb1: Heap - insert: rel
1663/16390/22670; tid 211239/23
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/30024C0; LSN 2/3004514:
prev 2/3000840; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8358831; tid 9103/77
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3004514; LSN 2/3005E9C:
prev 2/30024C0; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359008; tid 36219/100
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3005E9C; LSN 2/3007958:
prev 2/3004514; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359009; tid 15905/133
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3007958; LSN 2/3008AAC:
prev 2/3005E9C; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359011; tid 13711/14
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3008AAC; LSN 2/3009D98:
prev 2/3007958; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359012; tid 15516/235
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/3009D98; LSN 2/300B0D4:
prev 2/3008AAC; xid 18823427; bkpb1: Btree - insert: rel
1663/16390/8359024; tid 12394/300
<2007-03-12 10:39:10 CDT 82889>LOG: REDO @ 2/300B0D4; LSN 2/300B0FC:
prev 2/3009D98; xid 18823427: Transaction - commit: 2007-03-06 12:41:41
From the backup label file it seems the starting checkpoint record
should be at 1/F3FB4C3C. However the pg_control info indicates
2/3035F48. Which is correct?