The scenario is the same as before (the 11 steps), but the CREATE TABLE and COPY step has been reduced to:
CREATE TABLE test0 (filler VARCHAR(120)); COPY test0 FROM '/data0/dump/test0.dat' USING DELIMITERS ',';
Now the file 'test0.dat' consists of (128293) identical lines, each of 109 'a' charactors (plus end of line)
A script to run the whole business can be found here :
http://homepages.paradise.net.nz/markir/download/pitr-bug.tar.gz
(It will need a bit of editing for things like location of Pg, PGDATA, and you will need to make your own data file)
The main points of interest are:
- anything <=128392 rows in test0.dat results in 1 archived log, and the recovery succeeds
- anything >=128393 rows in test0.dat results in 2 or more archived logs, and recovery fails on the second log (and gives the zero length redo at 0/1FFFFE0 message).
Let me know if I can do any more legwork on this (I am considering re-compiling with WAL_DEBUG now that example is simpler)
regards
Mark
Simon Riggs wrote:
On Thu, 2004-07-15 at 10:47, Mark Kirkwood wrote:
I tried what I thought was a straightforward scenario, and seem to have broken it :-(
Here is the little tale
1) initdb 2) set archive_mode and archive_dest in postgresql.conf 3) startup 4) create database called 'test' 5) connect to 'test' and type 'checkpoint' 6) backup PGDATA using 'tar -zcvf' 7) create tables in 'test' and add data using COPY (exactly 2 logs worth) 8) shutdown and remove PGDATA 9) recover using 'tar -zxvf' 10) copy recovery.conf into PGDATA 11) startup
This is what I get :
LOG: database system was interrupted at 2004-07-15 21:24:04 NZST
LOG: recovery command file found...
LOG: restore_program = cp %s/%s %s
LOG: recovery_target_inclusive = true
LOG: recovery_debug_log = true
LOG: starting archive recovery
LOG: restored log file "0000000000000000" from archive
LOG: checkpoint record is at 0/A48054
LOG: redo record is at 0/A48054; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 496; next OID: 25419
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/A48094
LOG: restored log file "0000000000000001" from archive
LOG: record with zero length at 0/1FFFFE0
LOG: redo done at 0/1FFFF30
LOG: restored log file "0000000000000001" from archive
LOG: restored log file "0000000000000001" from archive
PANIC: concurrent transaction log activity while database system is shutting down
LOG: startup process (PID 13492) was terminated by signal 6
LOG: aborting startup due to startup process failure
The concurrent access is a bit of a puzzle, as this is my home machine (i.e. I am *sure* noone else is connected!)
I can see what is wrong now, but you'll have to help me on details your end...
The log shows that xlog 1 was restored from archive. It contains a zero length record, which indicates that it isn't yet full (or thats what the existing recovery code assumes it means). Which also indicates that it should never have been archived in the first place, and should not therefore be a candidate for a restore from archive.
The double message "restored log file" can only occur after you've retrieved a partially full file from archive - which as I say, shouldn't be there.
Other messages are essentially spurious in those circumstances.
Either: - somehow the files have been mixed up in the archive directory, which is possible if the filing discipline is not strict - various ways, unfortunately I would guess this to be the most likely, somehow - the file that has been restored has been damaged in some way - the archiver has archived a file too early (very unlikely, IMHO - thats the most robust bit of the code) - some aspect of the code has written a zero length record to WAL (which is supposed to not be possible, but we musn't discount an error in recent committed work)
- there may also be an effect going on with checkpoints that I don't understand...spurious checkpoint warning messages have already been observed and reported,
Best regards, Simon Riggs
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]