Hi all, A colleague, Ethan Mertz (in CC), has discovered that we don't handle correctly WAL records that are failing because of an OOM when allocating their required space. In the case of Ethan, we have bumped on the failure after an allocation failure on XLogReadRecordAlloc(): "out of memory while trying to decode a record of length"
As far as I can see, PerformWalRecovery() uses LOG as elevel for its private callback in the xlogreader, when doing through ReadRecord(), which leads to a failure being reported, but recovery considers that the failure is the end of WAL and decides to abruptly end recovery, leading to some data lost. In crash recovery, any records after the OOM would not be replayed. At quick glance, it seems to me that this can also impact standbys, where recovery could stop earlier than it should once a consistent point has been reached. Attached is a patch that can be applied on HEAD to inject an error, then just run the script xlogreader_oom.bash attached, or something similar, to see the failure in the logs: LOG: redo starts at 0/1913CD0 LOG: out of memory while trying to decode a record of length 57 LOG: redo done at 0/1917358 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s It also looks like recovery_prefetch may mitigate a bit the issue if we do a read in non-blocking mode, but that's not a strong guarantee either, especially if the host is under memory pressure. A patch is registered in the commit fest to improve the error detection handling, but as far as I can see it fails to handle the OOM case and replaces ReadRecord() to use a WARNING in the redo loop: https://www.postgresql.org/message-id/20200228.160100.2210969269596489579.horikyota.ntt%40gmail.com On top of my mind, any solution I can think of needs to add more information to XLogReaderState, where we'd either track the type of error that happened close to errormsg_buf which is where these errors are tracked, but any of that cannot be backpatched, unfortunately. Comments? -- Michael
#!/bin/bash killall -9 postgres # Setup Postgres rm -rf /tmp/data initdb -D /tmp/data cat <<EOL >> /tmp/data/postgresql.conf logging_collector = on log_min_messages = debug1 log_min_error_statement = debug1 #log_line_prefix = '%m [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h ' recovery_prefetch = off EOL pg_ctl start -D /tmp/data createdb $USER psql <<EOL CREATE TABLE aa (a int); CHECKPOINT; INSERT INTO aa VALUES (1); -- Generate enough records to trigger the OOM counter in xlogreader.c. DO \$\$ BEGIN FOR i IN 1..1000 LOOP EXECUTE 'SELECT pg_logical_emit_message(true, ''foo'', ''bar'');'; END LOOP; END; \$\$; INSERT INTO aa VALUES (2); SELECT pg_current_wal_lsn(); EOL # Force OOM at recovery. killall -9 postgres touch /tmp/xlogreader_oom # Startup bumps on OOM, succeeds earlier, but it should not. pg_ctl start -D /tmp/data # This loses some data, only one record is returned.. psql -c 'table aa' rm /tmp/xlogreader_oom
From 36de193b974eed4c45391af34c36f392a0968166 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Tue, 1 Aug 2023 11:49:53 +0900 Subject: [PATCH] Tweak to force OOM behavior when replaying records --- src/backend/access/transam/xlogreader.c | 27 ++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index c9f9f6e98f..73006f05b8 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -547,6 +547,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking) int readOff; DecodedXLogRecord *decoded; char *errormsg; /* not used */ + bool trigger_oom = false; /* * randAccess indicates whether to verify the previous-record pointer of @@ -691,7 +692,31 @@ restart: decoded = XLogReadRecordAlloc(state, total_len, !nonblocking /* allow_oversized */ ); - if (decoded == NULL) + +#ifndef FRONTEND + /* + * Trick to emulate an OOM after a hardcoded number of records + * replayed. + */ + { + struct stat fstat; + static int counter = 0; + + if (stat("/tmp/xlogreader_oom", &fstat) == 0) + { + counter++; + if (counter >= 100) + { + trigger_oom = true; + + /* Reset counter, to not fail when shutting down WAL */ + counter = 0; + } + } + } +#endif + + if (decoded == NULL || trigger_oom) { /* * There is no space in the decode buffer. The caller should help -- 2.40.1
signature.asc
Description: PGP signature