Re: Make mesage at end-of-recovery less scary.

Michael Paquier Mon, 08 Nov 2021 16:53:46 -0800

On Mon, Nov 08, 2021 at 02:59:46PM +0900, Kyotaro Horiguchi wrote:
> While checking the behavior for the case of missing-contrecord, I
> noticed that emode_for_corrupt_record() doesn't work as expected since
> readSource is reset to XLOG_FROM_ANY after a read failure. We could
> remember the last failed source but pg_wal should have been visited if
> page read error happened so I changed the function so that it treats
> XLOG_FROM_ANY the same way with XLOG_FROM_PG_WAL.


FWIW, I am not much a fan of assuming that it is fine to use
XLOG_FROM_ANY as a condition here.  The comments on top of
emode_for_corrupt_record() make it rather clear what the expectations
are, and this is the default readSource.

> (Otherwise we see "LOG: reached end-of-WAL at .." message after
>  "WARNING: missing contrecord at.." message.)

+      /* broken record found */
+      ereport(WARNING,
+                      (errmsg("redo is skipped"),
+                       errhint("This suggests WAL file corruption. You might 
need to check the database.")));
This looks rather scary to me, FWIW, and this could easily be reached
if one forgets about EndOfWAL while hacking on xlogreader.c.
Unlikely so, still.

+       report_invalid_record(state,
+                             "missing contrecord at %X/%X",
+                             LSN_FORMAT_ARGS(RecPtr));
Isn't there a risk here to break applications checking after error
messages stored in the WAL reader after seeing a contrecord?

+   if (record->xl_tot_len == 0)
+   {
+       /* This is strictly not an invalid state, so phrase it as so. */
+       report_invalid_record(state,
+                             "record length is 0 at %X/%X",
+                             LSN_FORMAT_ARGS(RecPtr));
+       state->EndOfWAL = true;
+       return false;
+   }
This assumes that a value of 0 for xl_tot_len is a synonym of the end
of WAL, but cannot we have also a corrupted record in this case in the
shape of xl_tot_len being 0?  We validate the full record after
reading the header, so it seems to me that we should not assume that
things are just ending as proposed in this patch.
--
Michael

signature.asc
Description: PGP signature

Re: Make mesage at end-of-recovery less scary.

Reply via email to