Re: [HACKERS] corrupt pages detected by enabling checksums

Andres Freund Wed, 03 Apr 2013 17:29:17 -0700

On 2013-04-04 01:52:41 +0200, Andres Freund wrote:
> On 2013-04-03 15:57:49 -0700, Jeff Janes wrote:
> > I've changed the subject from "regression test failed when enabling
> > checksum" because I now know they are totally unrelated.
> > 
> > My test case didn't need to depend on archiving being on, and so with a
> > simple tweak I rendered the two issues orthogonal.
> > 
> > 
> > On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis <pg...@j-davis.com> wrote:
> > 
> > > On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:
> > >
> > > > What I would probably really want is the data as it existed after the
> > > > crash but before recovery started, but since the postmaster
> > > > immediately starts recovery after the crash, I don't know of a good
> > > > way to capture this.
> > >
> > > Can you just turn off the restart_after_crash GUC? I had a chance to
> > > look at this, and seeing the block before and after recovery would be
> > > nice. I didn't see a log file in the data directory, but it didn't go
> > > through recovery, so I assume it already did that.
> > >
> > 
> > You don't know that the cluster is in the bad state until after it goes
> > through recovery because most crashes recover perfectly fine.  So it would
> > have to make a side-copy of the cluster after the crash, then recover the
> > original and see how things go, then either retain or delete the side-copy.
> >  Unfortunately my testing harness can't do this at the moment, because the
> > perl script storing the consistency info needs to survive over the crash
> > and recovery.   It might take me awhile to figure out how to make it do
> > this.
> > 
> > I have the server log just go to stderr, where it gets mingled together
> > with messages from my testing harness.  I had not uploaded that file.
> > 
> > Here is a new upload. It contains both a data_dir tarball including xlog,
> > and the mingled stderr ("do_new.out")
> > 
> > https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHc&usp=sharing
> > 
> > The other 3 files in it constitute the harness.  It is a quite a mess, with
> > some hard-coded paths.  The just-posted fix for wal_keep_segments will also
> > have to be applied.
> > 
> > 
> > 
> > >
> > > The block is corrupt as far as I can tell. The first third is written,
> > > and the remainder is all zeros. The header looks like this:
> > >
> > 
> > Yes, that part is by my design.  Why it didn't get fixed from a FPI is not
> > by my design, or course.
> 
> There are no FPIs (if you mean a full page image with that) afaics:
> 
> Your logs tell us about recovery:
> 27692  2013-04-03 10:09:15.621 PDT:LOG:  redo starts at 1/31000090
> 27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data 
> checksum in record at 1/31169A68
> 27692  2013-04-03 10:09:15.647 PDT:LOG:  redo done at 1/31169A38
> 
> And then the error:
> 
> 27698 SELECT 2013-04-03 10:09:16.688 PDT:WARNING:  page verification failed, 
> calculated checksum 16296 but expected 49517
> 27698 SELECT 2013-04-03 10:09:16.688 PDT:ERROR:  invalid page in block 90 of 
> relation base/16384/835589
> 
> 
> looking at xlog from that time, the first records are:
> rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 
> 1/31000028, prev 1/30000090, bkp: 0000, desc: checkpoint: redo 1/31000028; 
> tli 1; prev tli 1; fpw true; xid 0/26254997; oid 835589; multi 1; offset 0; 
> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online
> rmgr: XLOG        len (rec/tot):      4/    36, tx:          0, lsn: 
> 1/31000090, prev 1/31000028, bkp: 0000, desc: nextOid: 843781
> rmgr: Storage     len (rec/tot):     16/    48, tx:          0, lsn: 
> 1/310000B8, prev 1/31000090, bkp: 0000, desc: file create: base/16384/835589
> rmgr: Heap        len (rec/tot):     37/  7905, tx:   26254997, lsn: 
> 1/310000E8, prev 1/310000B8, bkp: 1000, desc: hot_update: rel 
> 1663/16384/12660; tid 0/47 xmax 26254997 ; new tid 0/33 xmax 0
> 
> So the file was created after the last checkpoint, so no full pages need
> to be logged. And we theoretically should be able to recovery all things
> like torn pages from the WAL since that should cover everything that
> happened to that file.
> 
> The bkp block 1 indicated in the 4th record above is the only one in
> that period of WAL btw.
> 
> Looking a bit more, the last page that's covered by WAL is:
> rmgr: Heap        len (rec/tot):     35/    67, tx:   26254998, lsn: 
> 1/311699A8, prev 1/31169960, bkp: 0000, desc: insert: rel 1663/16384/835589; 
> tid 44/56
> 
> which is far earlier than the "block 90" the error is found in unless i
> misunderstand something. Which is strange, since non-zero content to
> those pages shouldn't go to disk until the respective WAL is
> flushed. Huh, are we missing something here?


Looking at the page lsn's with dd I noticed something peculiar:

page 0:
01 00 00 00 18 c2 00 31 => 1/3100C218
page 1:
01 00 00 00 80 44 01 31 => 1/31014480
page 10:
01 00 00 00 60 ce 05 31 => 1/3105ce60
page 43:
01 00 00 00 58 7a 16 31 => 1/31167a58
page 44:
01 00 00 00 f0 99 16 31 => 1/311699f0
page 45:
00 00 00 00 00 00 00 00 => 0/0
page 90:
01 00 00 00 90 17 1d 32 => 1/321d1790
page 91:
01 00 00 00 38 ef 1b 32 => 1/321bef38

So we have written out pages that are after pages without a LSN that
have an LSN thats *beyond* the point XLOG has successfully been written
to disk (1/31169A38)?

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

Reply via email to