On 04/05/18 10:05, Heikki Linnakangas wrote:
On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinn...@iki.fi> wrote in 
<89e33d4f-5c75-0738-3dcb-44c4df59e...@iki.fi>
Looking at the patch linked above
(https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11693,6 +11693,10 @@ retry:
        Assert(reqLen <= readLen);
        *readTLI = curFileTLI;
+
+ if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
readBuf))
+               goto next_record_is_invalid;
+
        return readLen;
    next_record_is_invalid:

What is the point of adding this XLogReaderValidatePageHeader() call?
The caller of this callback function, ReadPageInternal(), checks the
page header anyway. Earlier in this thread, you said:

Without the lines, server actually fails to start replication.

(I try to remember the details...)

The difference is whether the function can cause retry for the
same portion of a set of continued records (without changing the
plugin API). XLogPageRead can do that. On the other hand all
ReadPageInternal can do is just letting the caller ReadRecords
retry from the beginning of the set of continued records since
the caller side handles only complete records.

In the failure case, in XLogPageRead, when read() fails, it can
try the next source midst of a continued records. On the other
hand if the segment was read but it was recycled one, it passes
"success" to ReadPageInternal and leads to retring from the
beginning of the recrod. Infinite loop.

I see. You have the same problem if you have a WAL file that's corrupt
in some other way, but one of the sources would have a correct copy.
ValidXLogPageHeader() only checks the page header, after all. But unlike
a recycled WAL segment, that's not supposed to happen as part of normal
operation, so I guess we can live with that.

Pushed this now, after adding some comments. Thanks!

Calling XLogReaderValidatePageHeader in ReadPageInternal is
redundant, but removing it may be interface change of xlogreader
plugin and I am not sure that is allowed.

We should avoid doing that in back-branches, at least. But in 'master',
I wouldn't mind redesigning the API. Dealing with all the retrying is
pretty complicated as it is, if we can simplify that somehow, I think
that'd be good.

I spent some time musing on what a better API might look like. We could remove the ReadPage callback, and instead have XLogReadRecord return a special return code to mean "give me more data". I'm thinking of something like:

/* return code of XLogReadRecord() */
typedef enum
{
    XLREAD_SUCCESS,
    XLREAD_INVALID_RECORD,  /* a record was read, but it was corrupt */
    XLREAD_INVALID_PAGE,    /* the page that was supplied looks invalid. */
XLREAD_NEED_DATA, /* caller should place more data in buffer, and retry */
} XLogReadRecord_Result;


And the calls to XLogReadRecord() would look something like this:

for(;;)
{
    rc = XLogReadRecord(reader, startptr, errormsg);

    if (rc == XLREAD_SUCCESS)
    {
        /* great, got record */
    }
    if (rc == XLREAD_INVALID_PAGE || XLREAD_INVALID_RECORD)
    {
        elog(ERROR, "invalid record");
    }
    if (rc == XLREAD_NEED_DATA)
    {
        /*
         * Read a page from disk, and place it into reader->readBuf
         */
        XLogPageRead(reader->readPagePtr, /* page to read */
                     reader->reqLen       /* # of bytes to read */ );
        /*
         * Now that we have read the data that XLogReadRecord()
         * requested, call it again.
         */
        continue;
    }
}

So instead of having a callback, XLogReadRecord() would return XLREAD_NEED_DATA. The caller would then need to place that data into the buffer, and call it again. If a record spans multiple pages, XLogReadRecord() would return with XLREAD_NEED_DATA multiple times, to read each page.

The important difference for the bug we're discussing on this thread is is that if you passed an invalid page to XLogReadRecord(), it would return with XLREAD_INVALID_PAGE. You could then try reading the same page from a different source, and call XLogReadRecord() again, and it could continue where it was left off, even if it was in the middle of a continuation record.

This is clearly not backpatchable, but maybe something to think about for v12.

- Heikki

Reply via email to