Fail hard if xlogreader.c fails on out-of-memory

Michael Paquier Tue, 26 Sep 2023 00:39:02 -0700

Hi all,
(Thomas in CC.)

Now that becfbdd6c1c9 has improved the situation to detect the
difference between out-of-memory and invalid WAL data in WAL, I guess
that it is time to tackle the problem of what we should do when
reading WAL records bit fail on out-of-memory.


To summarize, currently the WAL reader APIs fail the same way if we
detect some incorrect WAL record or if a memory allocation fails: an
error is generated and returned back to the caller to consume.  For
WAL replay, not being able to make the difference between an OOM and
the end-of-wal is a problem in some cases.  For example, in crash
recovery, failing an internal allocation will be detected as the
end-of-wal, causing recovery to stop prematurely.  In the worst cases,
this silently corrupts clusters because not all the records generated
in the local pg_wal/ have been replayed.  Oops.

When in standby mode, things are a bit better, because we'd just loop
and wait for the next record.  But, even in this case, if the startup
process does a crash recovery while standby is set up, we may finish
by attempting recovery from a different source than the local pg_wal/.
Not strictly critical, but less optimal in some cases as we could
switch to archive recovery earlier than necessary.

In a different thread, I have proposed to extend the WAL reader
facility so as an error code is returned to make the difference
between an OOM or the end of WAL with an incorrect record:
https://www.postgresql.org/message-id/ZRJ-p1dLUY0uoChc%40paquier.xyz

However this requires some ABI changes, so that's not backpatchable.

This leaves out what we can do for the existing back-branches, and
one option is to do the simplest thing I can think of: if an
allocation fails, just fail *hard*.  The allocations of the WAL reader
rely on palloc_extended(), so I'd like to suggest that we switch to
palloc() instead.  If we do so, an ERROR is promoted to a FATAL during
WAL replay, which makes sure that we will never stop recovery earlier
than we should, FATAL-ing before things go wrong.

Note that the WAL prefetching already relies on a palloc() on HEAD in
XLogReadRecordAlloc(), which would fail hard the same way on OOM.

So, attached is a proposal of patch to do something down to 12.

Thoughts and/or comments are welcome.
--
Michael

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index a17263df20..a1363e3b8f 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -43,7 +43,7 @@
 
 static void report_invalid_record(XLogReaderState *state, const char *fmt,...)
 			pg_attribute_printf(2, 3);
-static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
+static void allocate_recordbuf(XLogReaderState *state, uint32 reclength);
 static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 							 int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
@@ -155,14 +155,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
 	 * Allocate an initial readRecordBuf of minimal size, which can later be
 	 * enlarged if necessary.
 	 */
-	if (!allocate_recordbuf(state, 0))
-	{
-		pfree(state->errormsg_buf);
-		pfree(state->readBuf);
-		pfree(state);
-		return NULL;
-	}
-
+	allocate_recordbuf(state, 0);
 	return state;
 }
 
@@ -184,7 +177,6 @@ XLogReaderFree(XLogReaderState *state)
 
 /*
  * Allocate readRecordBuf to fit a record of at least the given length.
- * Returns true if successful, false if out of memory.
  *
  * readRecordBufSize is set to the new buffer size.
  *
@@ -196,7 +188,7 @@ XLogReaderFree(XLogReaderState *state)
  * Note: This routine should *never* be called for xl_tot_len until the header
  * of the record has been fully validated.
  */
-static bool
+static void
 allocate_recordbuf(XLogReaderState *state, uint32 reclength)
 {
 	uint32		newSize = reclength;
@@ -206,15 +198,8 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
 
 	if (state->readRecordBuf)
 		pfree(state->readRecordBuf);
-	state->readRecordBuf =
-		(char *) palloc_extended(newSize, MCXT_ALLOC_NO_OOM);
-	if (state->readRecordBuf == NULL)
-	{
-		state->readRecordBufSize = 0;
-		return false;
-	}
+	state->readRecordBuf = (char *) palloc(newSize);
 	state->readRecordBufSize = newSize;
-	return true;
 }
 
 /*
@@ -505,9 +490,7 @@ XLogReadRecordAlloc(XLogReaderState *state, size_t xl_tot_len, bool allow_oversi
 	/* Not enough space in the decode buffer.  Are we allowed to allocate? */
 	if (allow_oversized)
 	{
-		decoded = palloc_extended(required_space, MCXT_ALLOC_NO_OOM);
-		if (decoded == NULL)
-			return NULL;
+		decoded = palloc(required_space);
 		decoded->oversized = true;
 		return decoded;
 	}
@@ -815,13 +798,7 @@ restart:
 				Assert(gotlen <= lengthof(save_copy));
 				Assert(gotlen <= state->readRecordBufSize);
 				memcpy(save_copy, state->readRecordBuf, gotlen);
-				if (!allocate_recordbuf(state, total_len))
-				{
-					/* We treat this as a "bogus data" condition */
-					report_invalid_record(state, "record length %u at %X/%X too long",
-										  total_len, LSN_FORMAT_ARGS(RecPtr));
-					goto err;
-				}
+				allocate_recordbuf(state, total_len);
 				memcpy(state->readRecordBuf, save_copy, gotlen);
 				buffer = state->readRecordBuf + gotlen;
 			}
@@ -877,16 +854,8 @@ restart:
 		decoded = XLogReadRecordAlloc(state,
 									  total_len,
 									  true /* allow_oversized */ );
-		if (decoded == NULL)
-		{
-			/*
-			 * We failed to allocate memory for an oversized record.  As
-			 * above, we currently treat this as a "bogus data" condition.
-			 */
-			report_invalid_record(state,
-								  "out of memory while trying to decode a record of length %u", total_len);
-			goto err;
-		}
+		/* allocation should always happen under allow_oversized */
+		Assert(decoded != NULL);
 	}
 
 	if (DecodeXLogRecord(state, decoded, record, RecPtr, &errormsg))

signature.asc
Description: PGP signature

Fail hard if xlogreader.c fails on out-of-memory

Reply via email to