On 2013-04-03 20:45:51 -0400, Tom Lane wrote: > and...@anarazel.de (Andres Freund) writes: > > Looking at the page lsn's with dd I noticed something peculiar: > > > page 0: > > 01 00 00 00 18 c2 00 31 => 1/3100C218 > > page 1: > > 01 00 00 00 80 44 01 31 => 1/31014480 > > page 10: > > 01 00 00 00 60 ce 05 31 => 1/3105ce60 > > page 43: > > 01 00 00 00 58 7a 16 31 => 1/31167a58 > > page 44: > > 01 00 00 00 f0 99 16 31 => 1/311699f0 > > page 45: > > 00 00 00 00 00 00 00 00 => 0/0 > > page 90: > > 01 00 00 00 90 17 1d 32 => 1/321d1790 > > page 91: > > 01 00 00 00 38 ef 1b 32 => 1/321bef38 > > > So we have written out pages that are after pages without a LSN that > > have an LSN thats *beyond* the point XLOG has successfully been written > > to disk (1/31169A38)? > > If you're looking into the FPIs, those would contain the page's older > LSN, not the one assigned by the current WAL record.
Nope, thats from the heap, and the LSNs are *newer* than what startup recovered to. I am pretty sure by now we are missing out on valid WAL, I am just not sure why. Unfortunately we can't easily diagnose what happened at: 27692 2013-04-03 10:09:15.647 PDT:LOG: incorrect resource manager data checksum in record at 1/31169A68 since the startup process wrote its end of recovery checkpoint there: rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 1/31169A68, prev 1/31169A38, bkp: 0000, desc: checkpoint: redo 1/31169A68; tli 1; prev tli 1; fpw true; xid 0/26254999; oid 843781; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown Starting from a some blocks in that wal segments later: pg_xlogdump /tmp/tmp/data2/pg_xlog/000000010000000100000031 -s 1/3116c000 -n 10 first record is after 1/3116C000, at 1/3116D9D8, skipping over 6616 bytes rmgr: Heap len (rec/tot): 51/ 83, tx: 26254999, lsn: 1/3116D9D8, prev 1/3116BA20, bkp: 0000, desc: update: rel 1663/16384/835589; tid 38/148 xmax 26254999 ; new tid 44/57 xmax 0 rmgr: Btree len (rec/tot): 34/ 66, tx: 26254999, lsn: 1/3116DA30, prev 1/3116D9D8, bkp: 0000, desc: insert: rel 1663/16384/835590; tid 25/319 rmgr: Heap len (rec/tot): 51/ 83, tx: 26255000, lsn: 1/3116DA78, prev 1/3116DA30, bkp: 0000, desc: update: rel 1663/16384/835589; tid 19/214 xmax 26255000 ; new tid 44/58 xmax 0 the records continue again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers