Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Heikki Linnakangas
On 20.08.2012 18:25, Tom Lane wrote: Heikki Linnakangas writes: I was thinking that we might read gigabytes worth of bogus WAL into the memory buffer, if xl_tot_len is bogus and large, e.g 0x. But now that I look closer, the xlog record is validated after reading the first continuation

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Tom Lane
Heikki Linnakangas writes: > On 20.08.2012 17:04, Tom Lane wrote: >> Uh, no, you misread it. xl_tot_len is *zero* in this example. The >> problem is that RecordIsValid believes xl_len (and backup block size) >> even when it exceeds xl_tot_len. > Ah yes, I see that now. I think all we need then

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Heikki Linnakangas
On 20.08.2012 17:04, Tom Lane wrote: Heikki Linnakangas writes: On 18.08.2012 08:52, Amit kapila wrote: I think that missing check of total length has caused this problem. However now this check will be different. That check still exists, in ValidXLogRecordHeader(). However, we now allocat

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Andres Freund
On Monday, August 20, 2012 04:04:52 PM Tom Lane wrote: > Heikki Linnakangas writes: > > On 18.08.2012 08:52, Amit kapila wrote: > >> I think that missing check of total length has caused this problem. > >> However now this check will be different. > > > > That check still exists, in ValidXLogReco

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Tom Lane
Heikki Linnakangas writes: > On 18.08.2012 08:52, Amit kapila wrote: >> I think that missing check of total length has caused this problem. However >> now this check will be different. > That check still exists, in ValidXLogRecordHeader(). However, we now > allocate the buffer for the whole rec

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-19 Thread Heikki Linnakangas
On 18.08.2012 08:52, Amit kapila wrote: Tom Lane Sent: Saturday, August 18, 2012 7:16 AM so it merrily tries to compute a checksum on a gigabyte worth of data, and soon falls off the end of memory. In reality, inspection of the WAL file suggests that this is the end of valid data and what sh

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-17 Thread Amit kapila
Tom Lane Sent: Saturday, August 18, 2012 7:16 AM > The startup process's stack trace is > #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15) >at xlog.c:3713 > 3713COMP_CRC32(crc, XLogRecGetData(record), len); > (gdb) bt > #0 0x26fd1c in RecordIsValid (re