On 2013-04-03 15:57:49 -0700, Jeff Janes wrote: > I've changed the subject from "regression test failed when enabling > checksum" because I now know they are totally unrelated. > > My test case didn't need to depend on archiving being on, and so with a > simple tweak I rendered the two issues orthogonal. > > > On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis <pg...@j-davis.com> wrote: > > > On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote: > > > > > What I would probably really want is the data as it existed after the > > > crash but before recovery started, but since the postmaster > > > immediately starts recovery after the crash, I don't know of a good > > > way to capture this. > > > > Can you just turn off the restart_after_crash GUC? I had a chance to > > look at this, and seeing the block before and after recovery would be > > nice. I didn't see a log file in the data directory, but it didn't go > > through recovery, so I assume it already did that. > > > > You don't know that the cluster is in the bad state until after it goes > through recovery because most crashes recover perfectly fine. So it would > have to make a side-copy of the cluster after the crash, then recover the > original and see how things go, then either retain or delete the side-copy. > Unfortunately my testing harness can't do this at the moment, because the > perl script storing the consistency info needs to survive over the crash > and recovery. It might take me awhile to figure out how to make it do > this. > > I have the server log just go to stderr, where it gets mingled together > with messages from my testing harness. I had not uploaded that file. > > Here is a new upload. It contains both a data_dir tarball including xlog, > and the mingled stderr ("do_new.out") > > https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHc&usp=sharing > > The other 3 files in it constitute the harness. It is a quite a mess, with > some hard-coded paths. The just-posted fix for wal_keep_segments will also > have to be applied. > > > > > > > The block is corrupt as far as I can tell. The first third is written, > > and the remainder is all zeros. The header looks like this: > > > > Yes, that part is by my design. Why it didn't get fixed from a FPI is not > by my design, or course.
There are no FPIs (if you mean a full page image with that) afaics: Your logs tell us about recovery: 27692 2013-04-03 10:09:15.621 PDT:LOG: redo starts at 1/31000090 27692 2013-04-03 10:09:15.647 PDT:LOG: incorrect resource manager data checksum in record at 1/31169A68 27692 2013-04-03 10:09:15.647 PDT:LOG: redo done at 1/31169A38 And then the error: 27698 SELECT 2013-04-03 10:09:16.688 PDT:WARNING: page verification failed, calculated checksum 16296 but expected 49517 27698 SELECT 2013-04-03 10:09:16.688 PDT:ERROR: invalid page in block 90 of relation base/16384/835589 looking at xlog from that time, the first records are: rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 1/31000028, prev 1/30000090, bkp: 0000, desc: checkpoint: redo 1/31000028; tli 1; prev tli 1; fpw true; xid 0/26254997; oid 835589; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online rmgr: XLOG len (rec/tot): 4/ 36, tx: 0, lsn: 1/31000090, prev 1/31000028, bkp: 0000, desc: nextOid: 843781 rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: 1/310000B8, prev 1/31000090, bkp: 0000, desc: file create: base/16384/835589 rmgr: Heap len (rec/tot): 37/ 7905, tx: 26254997, lsn: 1/310000E8, prev 1/310000B8, bkp: 1000, desc: hot_update: rel 1663/16384/12660; tid 0/47 xmax 26254997 ; new tid 0/33 xmax 0 So the file was created after the last checkpoint, so no full pages need to be logged. And we theoretically should be able to recovery all things like torn pages from the WAL since that should cover everything that happened to that file. The bkp block 1 indicated in the 4th record above is the only one in that period of WAL btw. Looking a bit more, the last page that's covered by WAL is: rmgr: Heap len (rec/tot): 35/ 67, tx: 26254998, lsn: 1/311699A8, prev 1/31169960, bkp: 0000, desc: insert: rel 1663/16384/835589; tid 44/56 which is far earlier than the "block 90" the error is found in unless i misunderstand something. Which is strange, since non-zero content to those pages shouldn't go to disk until the respective WAL is flushed. Huh, are we missing something here? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers