Thank you for adding the detailed comment and commiting. At Sat, 5 May 2018 01:49:31 +0300, Heikki Linnakangas <hlinn...@iki.fi> wrote in <47215279-228d-f30d-35d1-16af695e5...@iki.fi> > On 04/05/18 10:05, Heikki Linnakangas wrote: > > On 24/04/18 13:57, Kyotaro HORIGUCHI wrote: > >> At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas > >> <hlinn...@iki.fi> wrote in > >> <89e33d4f-5c75-0738-3dcb-44c4df59e...@iki.fi> > >>> Looking at the patch linked above > >>> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp): > >>> > >>>> --- a/src/backend/access/transam/xlog.c > >>>> +++ b/src/backend/access/transam/xlog.c > >>>> @@ -11693,6 +11693,10 @@ retry: > >>>> Assert(reqLen <= readLen); > >>>> *readTLI = curFileTLI; > >>>> + > >>>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, > >>>> readBuf)) > >>>> + goto next_record_is_invalid; > >>>> + > >>>> return readLen; > >>>> next_record_is_invalid: > >>> > >>> What is the point of adding this XLogReaderValidatePageHeader() call? > >>> The caller of this callback function, ReadPageInternal(), checks the > >>> page header anyway. Earlier in this thread, you said: > >> > >> Without the lines, server actually fails to start replication. > >> > >> (I try to remember the details...) > >> > >> The difference is whether the function can cause retry for the > >> same portion of a set of continued records (without changing the > >> plugin API). XLogPageRead can do that. On the other hand all > >> ReadPageInternal can do is just letting the caller ReadRecords > >> retry from the beginning of the set of continued records since > >> the caller side handles only complete records. > >> > >> In the failure case, in XLogPageRead, when read() fails, it can > >> try the next source midst of a continued records. On the other > >> hand if the segment was read but it was recycled one, it passes > >> "success" to ReadPageInternal and leads to retring from the > >> beginning of the recrod. Infinite loop. > > I see. You have the same problem if you have a WAL file that's corrupt > > in some other way, but one of the sources would have a correct copy. > > ValidXLogPageHeader() only checks the page header, after all. But > > unlike > > a recycled WAL segment, that's not supposed to happen as part of > > normal > > operation, so I guess we can live with that.
Anyway we read successive complete records from different sources so I think if such curruption causes a problem we would have faced the same problem even without this. > Pushed this now, after adding some comments. Thanks! Thanks! > >> Calling XLogReaderValidatePageHeader in ReadPageInternal is > >> redundant, but removing it may be interface change of xlogreader > >> plugin and I am not sure that is allowed. > > We should avoid doing that in back-branches, at least. But in > > 'master', > > I wouldn't mind redesigning the API. Dealing with all the retrying is > > pretty complicated as it is, if we can simplify that somehow, I think > > that'd be good. Agreed. > I spent some time musing on what a better API might look like. We > could remove the ReadPage callback, and instead have XLogReadRecord > return a special return code to mean "give me more data". I'm thinking > of something like: Sounds reasonable. That makes an additional return/call iteration to XLogReadRecord but it would be ignorable comparing to the cost of XLogPageRead. > /* return code of XLogReadRecord() */ > typedef enum > { > XLREAD_SUCCESS, > XLREAD_INVALID_RECORD, /* a record was read, but it was corrupt */ > XLREAD_INVALID_PAGE, /* the page that was supplied looks invalid. */ > XLREAD_NEED_DATA, /* caller should place more data in buffer, and > retry */ > } XLogReadRecord_Result; > > > And the calls to XLogReadRecord() would look something like this: > > for(;;) > { > rc = XLogReadRecord(reader, startptr, errormsg); > > if (rc == XLREAD_SUCCESS) > { > /* great, got record */ > } > if (rc == XLREAD_INVALID_PAGE || XLREAD_INVALID_RECORD) > { > elog(ERROR, "invalid record"); > } > if (rc == XLREAD_NEED_DATA) > { > /* > * Read a page from disk, and place it into reader->readBuf > */ > XLogPageRead(reader->readPagePtr, /* page to read */ > reader->reqLen /* # of bytes to read */ ); > /* > * Now that we have read the data that XLogReadRecord() > * requested, call it again. > */ > continue; > } > } > > So instead of having a callback, XLogReadRecord() would return > XLREAD_NEED_DATA. The caller would then need to place that data into > the buffer, and call it again. If a record spans multiple pages, > XLogReadRecord() would return with XLREAD_NEED_DATA multiple times, to > read each page. It seems easier to understand at a glance. In short, I like it. > The important difference for the bug we're discussing on this thread > is is that if you passed an invalid page to XLogReadRecord(), it would > return with XLREAD_INVALID_PAGE. You could then try reading the same > page from a different source, and call XLogReadRecord() again, and it > could continue where it was left off, even if it was in the middle of > a continuation record. We will have more control on how to continue reading WAL with this than the current code and it seems preferable as the whole. > This is clearly not backpatchable, but maybe something to think about > for v12. Are you planning to working on this shortly? regards. -- Kyotaro Horiguchi NTT Open Source Software Center