On Tue, Apr 4, 2023 at 8:08 PM Aleksander Alekseev <aleksan...@timescale.com>
wrote:
> [v6]

0001:

Looks good to me. I've just made some small edits for v7 and wrote a commit
message to explain how we got here. This can be backpatched all the way, as
it's simply a correction. I do want to test on v11 first just for
completeness. (The reality has already been tested by others back to 9.6
but there's no substitute for trying it yourself). I hope to commit soon
after that.

0002:

I've been testing wraparound using the v3 convenience function in [1] to
reach xidStopLimit:

-- reduce log spam
alter system set log_min_messages = error;
alter system set client_min_messages = error;
-- restart

-- no actual replication, just for testing dropping it
select pg_create_physical_replication_slot('foo', true, false);

create table t (i int);

BEGIN;
insert into t values(1);
PREPARE TRANSACTION 'trx_id_pin';

-- get to xidStopLimit
select consume_xids(1*1000*1000*1000);
insert into t values(1);
select consume_xids(1*1000*1000*1000);
insert into t values(1);
select consume_xids(   140*1000*1000);
insert into t values(1);
select consume_xids(    10*1000*1000);

SELECT datname, age(datfrozenxid) FROM pg_database;

-- works just fine
select pg_drop_replication_slot('foo');

COMMIT PREPARED 'trx_id_pin';

-- watch autovacuum take care of it automatically:
SELECT datname, age(datfrozenxid) FROM pg_database;

The consume_xids function builds easily on PG14, but before that it would
need a bit of work because data types changed. That coincidentally was the
first version to include the failsafe, which is convenient in this
scenario. I'd like to do testing on PG12/13 before commit, which would
require turning off truncation in the command (and can also be made faster
by turning off index cleanup), but I'm also okay with going ahead with just
going back to PG14 at first. That also safest.

I made some small changes and wrote a suitably comprehensive commit
message. I separated the possible uses for single-user mode into a separate
paragraph in the "Note:" , moved the justification for the 3-million xid
margin there, and restored the link to how to run it (I already mentioned
we still need this info, but didn't notice this part didn't make it back
in).

0003:

It really needs a more comprehensive change, and just making a long hint
even longer doesn't seem worth doing. I'd like to set that aside and come
back to it. I've left it out of the attached set.

[1]
https://www.postgresql.org/message-id/CAD21AoBZ3t%2BfRtVamQTA%2BwBJaapFUY1dfP08-rxsQ%2BfouPvgKg%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com
From f5fa3605effd624b263ff3e5dbb8b3e5c8992dba Mon Sep 17 00:00:00 2001
From: John Naylor <john.nay...@postgresql.org>
Date: Sat, 29 Apr 2023 14:23:50 +0700
Subject: [PATCH v7 2/2] Stop telling users to run VACUUM in a single-user mode

Single-user mode is almost always the worst thing to reach for in a
VACUUM emergency:

* Restarting in single user mode requires a shutdown checkpoint
* The user interface is completely different, and awful
* The buffer cache is completely cold
* The checkpointer, background writer and WAL writer are not running
* Without checkpoints WAL segments can not be rotated and reused
* Replication is not running, so after VACUUM is done and database
  is started in normal mode, there is a huge backlog to replicate
* pg_stat_progress_vacuum is not available so there is no indication
  of when the command will complete
* VACUUM VERBOSE doesn't work - there is no output from single-user
  mode vacuum, with or without VERBOSE

If that weren't enough, it's also unsafe because the wraparound
limits are not enforced. It is by no means impossible to corrupt the
database by mistake, such as by a user running VACUUM FULL because it
sounds better.

As mentioned in commit XXXXXXXXX, the system is perfectly capable of
accepting commands when reaching xidStopLimit. Most VACUUM operations
will work normally, with one exception: A new XID is required when
truncating the relation if wal_level is above "minimal". As of v14
the failsafe mechanism disables truncation some time before reaching
xidStopLimit, so this is not an issue in practice.

By remaining in multi-user mode, users still have read-only access to
their database, they can use parallelism, they can use command line
utilities like vacuumdb, and they can remotely access the database
using normal clients.

The only reason to restart in single-user mode is to DROP or TRUNCATE
a table, when it is suspected that doing that would be faster than
vacuuming.

Also add an explicit note warning against using single-user mode.

Backpatch to v14, which is the first version with the VACUUM failsafe.

XXX We should consider v12-13 with "VACUUM (TRUNCATE off, INDEX_CLEANUP
off);", but it's not yet convenient to get to xidStopLimit before v14.

XXX We could consider v11 with careful instructions about redoing
VACUUMs in single-user mode when truncation is necessary.

Aleksander Alekseev, with some adjustments by me

Reported as various times by (at least) Hannu Krosing, Robert Haas,
and Andres Freund

Discussion: https://postgr.es/m/caj7c6tm2d277u2wh8x78kg8ph3tduqebv3_jcjqakyqfhcf...@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CA%2BTgmob1QCMJrHwRBK8HZtGsr%2B6cJANRQw2mEgJ9e%3DD%2Bz7cOsw%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com
---
 doc/src/sgml/maintenance.sgml       | 32 +++++++++++++++++++++--------
 src/backend/access/transam/varsup.c |  4 ++--
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 105a9900cb..ae9a35fb73 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -669,22 +669,36 @@ HINT:  To prevent entering read-only mode, execute a database-wide VACUUM in tha
 
 <programlisting>
 ERROR:  database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
-HINT:  Stop the postmaster and vacuum that database in single-user mode.
+HINT:  Execute a database-wide VACUUM in that database.
 </programlisting>
 
     In this condition any transactions already in progress can continue,
     but only read-only transactions can be started. Operations that
     modify database records or truncate relations will fail.
-
-    The three-million-transaction safety margin exists to let the
-    administrator recover without data loss, by manually executing the
-    required <command>VACUUM</command> commands.  However
-    the only way to do this is to stop the server and start the server in single-user
-    mode to execute <command>VACUUM</command>. See the
-    <xref linkend="app-postgres"/> reference page for details about using
-    single-user mode.
+    The <command>VACUUM</command> command can still be run normally to recover.
+         <!-- v12 and v13 need VACUUM (TRUNCATE off, INDEX_CLEANUP off); >
+         <!-- v11 maybe mention that it will often succed, but will need S-U if truncation happens >
    </para>
 
+   <note>
+    <para>
+     In earlier versions it was required to stop the postmaster and
+     <command>VACUUM</command> the database in a single-user mode. There is no
+     need to use single-user mode anymore, and in fact it's strongly
+     discouraged: It increases downtime, makes monitoring impossible,
+     disables replication, bypasses safeguards against wraparound, etc.
+    </para>
+
+    <para>
+     The only reason to use single-user mode in an emergency is to e.g.
+     <command>TRUNCATE</command> or <command>DROP</command> unneeded tables
+     to avoid needing to <command>VACUUM</command> them.
+     The three-million-transaction safety margin exists to let the
+     administrator do this. See the <xref linkend="app-postgres"/>
+     reference page for details about using single-user mode.
+    </para>
+   </note>
+
    <sect3 id="vacuum-for-multixact-wraparound">
     <title>Multixacts and Wraparound</title>
 
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 485d8ebf81..667da0fd64 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact)
 						(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 						 errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
 								oldest_datname),
-						 errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+						 errhint("Execute a database-wide VACUUM in that database.\n"
 								 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 						 errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
 								oldest_datoid),
-						 errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
+						 errhint("Execute a database-wide VACUUM in that database.\n"
 								 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 		}
 		else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
-- 
2.39.2

From fe9ebc1f89e946f2d9e47fdb4f889a3872532338 Mon Sep 17 00:00:00 2001
From: John Naylor <john.nay...@postgresql.org>
Date: Fri, 28 Apr 2023 16:08:33 +0700
Subject: [PATCH v7 1/2] Correct outdated docs and messages regarding XID
 limits

Previously, when approaching xidStopLimit or xidWrapLimit, log messages
would warn against a "database shutdown", and when it reached those
limits claimed that it "is not accepting commands". This language
originated in commit 60b2444cc when the xidStopLimit was added in
2005. At that time, even a trivial SELECT would have failed.

Commit 295e63983d in 2007 introduced virtual transaction IDs, which
allowed actual XIDs to be allocated lazily when it is necessary
to do so, such as when modifying database records. Since then, the
behavior at these limits is merely to refuse to allocate new XIDs,
so read-only queries can continue to be initiated.

This has been wrong for a very long time, so backpatch to all
supported branches.

Aleksander Alekseev, with some editing by me

Reviewed by Pavel Borisov
Discussion: https://postgr.es/m/caj7c6tm2d277u2wh8x78kg8ph3tduqebv3_jcjqakyqfhcf...@mail.gmail.com
---
 doc/src/sgml/maintenance.sgml          | 22 ++++++++++++----------
 src/backend/access/transam/multixact.c |  4 ++--
 src/backend/access/transam/varsup.c    | 12 ++++++------
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a8..105a9900cb 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -656,7 +656,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
 
 <programlisting>
 WARNING:  database "mydb" must be vacuumed within 39985967 transactions
-HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.
+HINT:  To prevent entering read-only mode, execute a database-wide VACUUM in that database.
 </programlisting>
 
     (A manual <command>VACUUM</command> should fix the problem, as suggested by the
@@ -664,23 +664,25 @@ HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that data
     superuser, else it will fail to process system catalogs and thus not
     be able to advance the database's <structfield>datfrozenxid</structfield>.)
     If these warnings are
-    ignored, the system will shut down and refuse to start any new
-    transactions once there are fewer than three million transactions left
-    until wraparound:
+    ignored, the system will refuse to allocate new XIDs once there are
+    fewer than three million transactions left until wraparound:
 
 <programlisting>
-ERROR:  database is not accepting commands to avoid wraparound data loss in database "mydb"
+ERROR:  database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb"
 HINT:  Stop the postmaster and vacuum that database in single-user mode.
 </programlisting>
 
+    In this condition any transactions already in progress can continue,
+    but only read-only transactions can be started. Operations that
+    modify database records or truncate relations will fail.
+
     The three-million-transaction safety margin exists to let the
     administrator recover without data loss, by manually executing the
-    required <command>VACUUM</command> commands.  However, since the system will not
-    execute commands once it has gone into the safety shutdown mode,
+    required <command>VACUUM</command> commands.  However
     the only way to do this is to stop the server and start the server in single-user
-    mode to execute <command>VACUUM</command>.  The shutdown mode is not enforced
-    in single-user mode.  See the <xref linkend="app-postgres"/> reference
-    page for details about using single-user mode.
+    mode to execute <command>VACUUM</command>. See the
+    <xref linkend="app-postgres"/> reference page for details about using
+    single-user mode.
    </para>
 
    <sect3 id="vacuum-for-multixact-wraparound">
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index fe6698d5ff..1dd29c02cc 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
 								   multiWrapLimit - curMulti,
 								   oldest_datname,
 								   multiWrapLimit - curMulti),
-					 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+					 errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
 							 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 		else
 			ereport(WARNING,
@@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid,
 								   multiWrapLimit - curMulti,
 								   oldest_datoid,
 								   multiWrapLimit - curMulti),
-					 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+					 errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
 							 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 	}
 }
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..485d8ebf81 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact)
 			if (oldest_datname)
 				ereport(ERROR,
 						(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-						 errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
+						 errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"",
 								oldest_datname),
 						 errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
 								 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 			else
 				ereport(ERROR,
 						(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-						 errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
+						 errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u",
 								oldest_datoid),
 						 errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
 								 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
@@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact)
 						(errmsg("database \"%s\" must be vacuumed within %u transactions",
 								oldest_datname,
 								xidWrapLimit - xid),
-						 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+						 errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
 								 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 			else
 				ereport(WARNING,
 						(errmsg("database with OID %u must be vacuumed within %u transactions",
 								oldest_datoid,
 								xidWrapLimit - xid),
-						 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+						 errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
 								 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 		}
 
@@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
 					(errmsg("database \"%s\" must be vacuumed within %u transactions",
 							oldest_datname,
 							xidWrapLimit - curXid),
-					 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+					 errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
 							 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 		else
 			ereport(WARNING,
 					(errmsg("database with OID %u must be vacuumed within %u transactions",
 							oldest_datoid,
 							xidWrapLimit - curXid),
-					 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
+					 errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n"
 							 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
 	}
 }
-- 
2.39.2

Reply via email to