On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinn...@iki.fi> wrote in
<89e33d4f-5c75-0738-3dcb-44c4df59e...@iki.fi>
Looking at the patch linked above
(https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11693,6 +11693,10 @@ retry:
Assert(reqLen <= readLen);
*readTLI = curFileTLI;
+
+ if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
readBuf))
+ goto next_record_is_invalid;
+
return readLen;
next_record_is_invalid:
What is the point of adding this XLogReaderValidatePageHeader() call?
The caller of this callback function, ReadPageInternal(), checks the
page header anyway. Earlier in this thread, you said:
Without the lines, server actually fails to start replication.
(I try to remember the details...)
The difference is whether the function can cause retry for the
same portion of a set of continued records (without changing the
plugin API). XLogPageRead can do that. On the other hand all
ReadPageInternal can do is just letting the caller ReadRecords
retry from the beginning of the set of continued records since
the caller side handles only complete records.
In the failure case, in XLogPageRead, when read() fails, it can
try the next source midst of a continued records. On the other
hand if the segment was read but it was recycled one, it passes
"success" to ReadPageInternal and leads to retring from the
beginning of the recrod. Infinite loop.
I see. You have the same problem if you have a WAL file that's corrupt
in some other way, but one of the sources would have a correct copy.
ValidXLogPageHeader() only checks the page header, after all. But unlike
a recycled WAL segment, that's not supposed to happen as part of normal
operation, so I guess we can live with that.
Calling XLogReaderValidatePageHeader in ReadPageInternal is
redundant, but removing it may be interface change of xlogreader
plugin and I am not sure that is allowed.
We should avoid doing that in back-branches, at least. But in 'master',
I wouldn't mind redesigning the API. Dealing with all the retrying is
pretty complicated as it is, if we can simplify that somehow, I think
that'd be good.
- Heikki