On Tue, Apr 4, 2023 at 8:08 PM Aleksander Alekseev <aleksan...@timescale.com> wrote: > [v6]
0001: Looks good to me. I've just made some small edits for v7 and wrote a commit message to explain how we got here. This can be backpatched all the way, as it's simply a correction. I do want to test on v11 first just for completeness. (The reality has already been tested by others back to 9.6 but there's no substitute for trying it yourself). I hope to commit soon after that. 0002: I've been testing wraparound using the v3 convenience function in [1] to reach xidStopLimit: -- reduce log spam alter system set log_min_messages = error; alter system set client_min_messages = error; -- restart -- no actual replication, just for testing dropping it select pg_create_physical_replication_slot('foo', true, false); create table t (i int); BEGIN; insert into t values(1); PREPARE TRANSACTION 'trx_id_pin'; -- get to xidStopLimit select consume_xids(1*1000*1000*1000); insert into t values(1); select consume_xids(1*1000*1000*1000); insert into t values(1); select consume_xids( 140*1000*1000); insert into t values(1); select consume_xids( 10*1000*1000); SELECT datname, age(datfrozenxid) FROM pg_database; -- works just fine select pg_drop_replication_slot('foo'); COMMIT PREPARED 'trx_id_pin'; -- watch autovacuum take care of it automatically: SELECT datname, age(datfrozenxid) FROM pg_database; The consume_xids function builds easily on PG14, but before that it would need a bit of work because data types changed. That coincidentally was the first version to include the failsafe, which is convenient in this scenario. I'd like to do testing on PG12/13 before commit, which would require turning off truncation in the command (and can also be made faster by turning off index cleanup), but I'm also okay with going ahead with just going back to PG14 at first. That also safest. I made some small changes and wrote a suitably comprehensive commit message. I separated the possible uses for single-user mode into a separate paragraph in the "Note:" , moved the justification for the 3-million xid margin there, and restored the link to how to run it (I already mentioned we still need this info, but didn't notice this part didn't make it back in). 0003: It really needs a more comprehensive change, and just making a long hint even longer doesn't seem worth doing. I'd like to set that aside and come back to it. I've left it out of the attached set. [1] https://www.postgresql.org/message-id/CAD21AoBZ3t%2BfRtVamQTA%2BwBJaapFUY1dfP08-rxsQ%2BfouPvgKg%40mail.gmail.com -- John Naylor EDB: http://www.enterprisedb.com
From f5fa3605effd624b263ff3e5dbb8b3e5c8992dba Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Sat, 29 Apr 2023 14:23:50 +0700 Subject: [PATCH v7 2/2] Stop telling users to run VACUUM in a single-user mode Single-user mode is almost always the worst thing to reach for in a VACUUM emergency: * Restarting in single user mode requires a shutdown checkpoint * The user interface is completely different, and awful * The buffer cache is completely cold * The checkpointer, background writer and WAL writer are not running * Without checkpoints WAL segments can not be rotated and reused * Replication is not running, so after VACUUM is done and database is started in normal mode, there is a huge backlog to replicate * pg_stat_progress_vacuum is not available so there is no indication of when the command will complete * VACUUM VERBOSE doesn't work - there is no output from single-user mode vacuum, with or without VERBOSE If that weren't enough, it's also unsafe because the wraparound limits are not enforced. It is by no means impossible to corrupt the database by mistake, such as by a user running VACUUM FULL because it sounds better. As mentioned in commit XXXXXXXXX, the system is perfectly capable of accepting commands when reaching xidStopLimit. Most VACUUM operations will work normally, with one exception: A new XID is required when truncating the relation if wal_level is above "minimal". As of v14 the failsafe mechanism disables truncation some time before reaching xidStopLimit, so this is not an issue in practice. By remaining in multi-user mode, users still have read-only access to their database, they can use parallelism, they can use command line utilities like vacuumdb, and they can remotely access the database using normal clients. The only reason to restart in single-user mode is to DROP or TRUNCATE a table, when it is suspected that doing that would be faster than vacuuming. Also add an explicit note warning against using single-user mode. Backpatch to v14, which is the first version with the VACUUM failsafe. XXX We should consider v12-13 with "VACUUM (TRUNCATE off, INDEX_CLEANUP off);", but it's not yet convenient to get to xidStopLimit before v14. XXX We could consider v11 with careful instructions about redoing VACUUMs in single-user mode when truncation is necessary. Aleksander Alekseev, with some adjustments by me Reported as various times by (at least) Hannu Krosing, Robert Haas, and Andres Freund Discussion: https://postgr.es/m/caj7c6tm2d277u2wh8x78kg8ph3tduqebv3_jcjqakyqfhcf...@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CA%2BTgmob1QCMJrHwRBK8HZtGsr%2B6cJANRQw2mEgJ9e%3DD%2Bz7cOsw%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com --- doc/src/sgml/maintenance.sgml | 32 +++++++++++++++++++++-------- src/backend/access/transam/varsup.c | 4 ++-- 2 files changed, 25 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 105a9900cb..ae9a35fb73 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -669,22 +669,36 @@ HINT: To prevent entering read-only mode, execute a database-wide VACUUM in tha <programlisting> ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb" -HINT: Stop the postmaster and vacuum that database in single-user mode. +HINT: Execute a database-wide VACUUM in that database. </programlisting> In this condition any transactions already in progress can continue, but only read-only transactions can be started. Operations that modify database records or truncate relations will fail. - - The three-million-transaction safety margin exists to let the - administrator recover without data loss, by manually executing the - required <command>VACUUM</command> commands. However - the only way to do this is to stop the server and start the server in single-user - mode to execute <command>VACUUM</command>. See the - <xref linkend="app-postgres"/> reference page for details about using - single-user mode. + The <command>VACUUM</command> command can still be run normally to recover. + <!-- v12 and v13 need VACUUM (TRUNCATE off, INDEX_CLEANUP off); > + <!-- v11 maybe mention that it will often succed, but will need S-U if truncation happens > </para> + <note> + <para> + In earlier versions it was required to stop the postmaster and + <command>VACUUM</command> the database in a single-user mode. There is no + need to use single-user mode anymore, and in fact it's strongly + discouraged: It increases downtime, makes monitoring impossible, + disables replication, bypasses safeguards against wraparound, etc. + </para> + + <para> + The only reason to use single-user mode in an emergency is to e.g. + <command>TRUNCATE</command> or <command>DROP</command> unneeded tables + to avoid needing to <command>VACUUM</command> them. + The three-million-transaction safety margin exists to let the + administrator do this. See the <xref linkend="app-postgres"/> + reference page for details about using single-user mode. + </para> + </note> + <sect3 id="vacuum-for-multixact-wraparound"> <title>Multixacts and Wraparound</title> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c index 485d8ebf81..667da0fd64 100644 --- a/src/backend/access/transam/varsup.c +++ b/src/backend/access/transam/varsup.c @@ -128,14 +128,14 @@ GetNewTransactionId(bool isSubXact) (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"", oldest_datname), - errhint("Stop the postmaster and vacuum that database in single-user mode.\n" + errhint("Execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u", oldest_datoid), - errhint("Stop the postmaster and vacuum that database in single-user mode.\n" + errhint("Execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); } else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit)) -- 2.39.2
From fe9ebc1f89e946f2d9e47fdb4f889a3872532338 Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Fri, 28 Apr 2023 16:08:33 +0700 Subject: [PATCH v7 1/2] Correct outdated docs and messages regarding XID limits Previously, when approaching xidStopLimit or xidWrapLimit, log messages would warn against a "database shutdown", and when it reached those limits claimed that it "is not accepting commands". This language originated in commit 60b2444cc when the xidStopLimit was added in 2005. At that time, even a trivial SELECT would have failed. Commit 295e63983d in 2007 introduced virtual transaction IDs, which allowed actual XIDs to be allocated lazily when it is necessary to do so, such as when modifying database records. Since then, the behavior at these limits is merely to refuse to allocate new XIDs, so read-only queries can continue to be initiated. This has been wrong for a very long time, so backpatch to all supported branches. Aleksander Alekseev, with some editing by me Reviewed by Pavel Borisov Discussion: https://postgr.es/m/caj7c6tm2d277u2wh8x78kg8ph3tduqebv3_jcjqakyqfhcf...@mail.gmail.com --- doc/src/sgml/maintenance.sgml | 22 ++++++++++++---------- src/backend/access/transam/multixact.c | 4 ++-- src/backend/access/transam/varsup.c | 12 ++++++------ 3 files changed, 20 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 9cf9d030a8..105a9900cb 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -656,7 +656,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database; <programlisting> WARNING: database "mydb" must be vacuumed within 39985967 transactions -HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database. +HINT: To prevent entering read-only mode, execute a database-wide VACUUM in that database. </programlisting> (A manual <command>VACUUM</command> should fix the problem, as suggested by the @@ -664,23 +664,25 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data superuser, else it will fail to process system catalogs and thus not be able to advance the database's <structfield>datfrozenxid</structfield>.) If these warnings are - ignored, the system will shut down and refuse to start any new - transactions once there are fewer than three million transactions left - until wraparound: + ignored, the system will refuse to allocate new XIDs once there are + fewer than three million transactions left until wraparound: <programlisting> -ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb" +ERROR: database is not accepting commands that generate new XIDs to avoid wraparound data loss in database "mydb" HINT: Stop the postmaster and vacuum that database in single-user mode. </programlisting> + In this condition any transactions already in progress can continue, + but only read-only transactions can be started. Operations that + modify database records or truncate relations will fail. + The three-million-transaction safety margin exists to let the administrator recover without data loss, by manually executing the - required <command>VACUUM</command> commands. However, since the system will not - execute commands once it has gone into the safety shutdown mode, + required <command>VACUUM</command> commands. However the only way to do this is to stop the server and start the server in single-user - mode to execute <command>VACUUM</command>. The shutdown mode is not enforced - in single-user mode. See the <xref linkend="app-postgres"/> reference - page for details about using single-user mode. + mode to execute <command>VACUUM</command>. See the + <xref linkend="app-postgres"/> reference page for details about using + single-user mode. </para> <sect3 id="vacuum-for-multixact-wraparound"> diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index fe6698d5ff..1dd29c02cc 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -2335,7 +2335,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid, multiWrapLimit - curMulti, oldest_datname, multiWrapLimit - curMulti), - errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" + errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(WARNING, @@ -2344,7 +2344,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid, multiWrapLimit - curMulti, oldest_datoid, multiWrapLimit - curMulti), - errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" + errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); } } diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c index 334adac09e..485d8ebf81 100644 --- a/src/backend/access/transam/varsup.c +++ b/src/backend/access/transam/varsup.c @@ -126,14 +126,14 @@ GetNewTransactionId(bool isSubXact) if (oldest_datname) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"", + errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database \"%s\"", oldest_datname), errhint("Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u", + errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database with OID %u", oldest_datoid), errhint("Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); @@ -148,14 +148,14 @@ GetNewTransactionId(bool isSubXact) (errmsg("database \"%s\" must be vacuumed within %u transactions", oldest_datname, xidWrapLimit - xid), - errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" + errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(WARNING, (errmsg("database with OID %u must be vacuumed within %u transactions", oldest_datoid, xidWrapLimit - xid), - errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" + errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); } @@ -463,14 +463,14 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid) (errmsg("database \"%s\" must be vacuumed within %u transactions", oldest_datname, xidWrapLimit - curXid), - errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" + errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(WARNING, (errmsg("database with OID %u must be vacuumed within %u transactions", oldest_datoid, xidWrapLimit - curXid), - errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" + errhint("To prevent entering read-only mode, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); } } -- 2.39.2