On 2013-04-04 01:52:41 +0200, Andres Freund wrote: > On 2013-04-03 15:57:49 -0700, Jeff Janes wrote: > > I've changed the subject from "regression test failed when enabling > > checksum" because I now know they are totally unrelated. > > > > My test case didn't need to depend on archiving being on, and so with a > > simple tweak I rendered the two issues orthogonal. > > > > > > On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis <pg...@j-davis.com> wrote: > > > > > On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote: > > > > > > > What I would probably really want is the data as it existed after the > > > > crash but before recovery started, but since the postmaster > > > > immediately starts recovery after the crash, I don't know of a good > > > > way to capture this. > > > > > > Can you just turn off the restart_after_crash GUC? I had a chance to > > > look at this, and seeing the block before and after recovery would be > > > nice. I didn't see a log file in the data directory, but it didn't go > > > through recovery, so I assume it already did that. > > > > > > > You don't know that the cluster is in the bad state until after it goes > > through recovery because most crashes recover perfectly fine. So it would > > have to make a side-copy of the cluster after the crash, then recover the > > original and see how things go, then either retain or delete the side-copy. > > Unfortunately my testing harness can't do this at the moment, because the > > perl script storing the consistency info needs to survive over the crash > > and recovery. It might take me awhile to figure out how to make it do > > this. > > > > I have the server log just go to stderr, where it gets mingled together > > with messages from my testing harness. I had not uploaded that file. > > > > Here is a new upload. It contains both a data_dir tarball including xlog, > > and the mingled stderr ("do_new.out") > > > > https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHc&usp=sharing > > > > The other 3 files in it constitute the harness. It is a quite a mess, with > > some hard-coded paths. The just-posted fix for wal_keep_segments will also > > have to be applied. > > > > > > > > > > > > The block is corrupt as far as I can tell. The first third is written, > > > and the remainder is all zeros. The header looks like this: > > > > > > > Yes, that part is by my design. Why it didn't get fixed from a FPI is not > > by my design, or course. > > There are no FPIs (if you mean a full page image with that) afaics: > > Your logs tell us about recovery: > 27692 2013-04-03 10:09:15.621 PDT:LOG: redo starts at 1/31000090 > 27692 2013-04-03 10:09:15.647 PDT:LOG: incorrect resource manager data > checksum in record at 1/31169A68 > 27692 2013-04-03 10:09:15.647 PDT:LOG: redo done at 1/31169A38 > > And then the error: > > 27698 SELECT 2013-04-03 10:09:16.688 PDT:WARNING: page verification failed, > calculated checksum 16296 but expected 49517 > 27698 SELECT 2013-04-03 10:09:16.688 PDT:ERROR: invalid page in block 90 of > relation base/16384/835589 > > > looking at xlog from that time, the first records are: > rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: > 1/31000028, prev 1/30000090, bkp: 0000, desc: checkpoint: redo 1/31000028; > tli 1; prev tli 1; fpw true; xid 0/26254997; oid 835589; multi 1; offset 0; > oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online > rmgr: XLOG len (rec/tot): 4/ 36, tx: 0, lsn: > 1/31000090, prev 1/31000028, bkp: 0000, desc: nextOid: 843781 > rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: > 1/310000B8, prev 1/31000090, bkp: 0000, desc: file create: base/16384/835589 > rmgr: Heap len (rec/tot): 37/ 7905, tx: 26254997, lsn: > 1/310000E8, prev 1/310000B8, bkp: 1000, desc: hot_update: rel > 1663/16384/12660; tid 0/47 xmax 26254997 ; new tid 0/33 xmax 0 > > So the file was created after the last checkpoint, so no full pages need > to be logged. And we theoretically should be able to recovery all things > like torn pages from the WAL since that should cover everything that > happened to that file. > > The bkp block 1 indicated in the 4th record above is the only one in > that period of WAL btw. > > Looking a bit more, the last page that's covered by WAL is: > rmgr: Heap len (rec/tot): 35/ 67, tx: 26254998, lsn: > 1/311699A8, prev 1/31169960, bkp: 0000, desc: insert: rel 1663/16384/835589; > tid 44/56 > > which is far earlier than the "block 90" the error is found in unless i > misunderstand something. Which is strange, since non-zero content to > those pages shouldn't go to disk until the respective WAL is > flushed. Huh, are we missing something here?
Looking at the page lsn's with dd I noticed something peculiar: page 0: 01 00 00 00 18 c2 00 31 => 1/3100C218 page 1: 01 00 00 00 80 44 01 31 => 1/31014480 page 10: 01 00 00 00 60 ce 05 31 => 1/3105ce60 page 43: 01 00 00 00 58 7a 16 31 => 1/31167a58 page 44: 01 00 00 00 f0 99 16 31 => 1/311699f0 page 45: 00 00 00 00 00 00 00 00 => 0/0 page 90: 01 00 00 00 90 17 1d 32 => 1/321d1790 page 91: 01 00 00 00 38 ef 1b 32 => 1/321bef38 So we have written out pages that are after pages without a LSN that have an LSN thats *beyond* the point XLOG has successfully been written to disk (1/31169A38)? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers