(moving discussion to -hackers, for patch-review)

On Mon, Dec 16, 2024 at 11:30 PM Laurenz Albe <laurenz.a...@cybertec.at> wrote:
>
> On Mon, 2024-12-16 at 16:25 -0800, Will Storey wrote:
> > I would like to disable vacuum's truncate behaviour for autovacuum.
> > Previously I had an outage due to its access exclusive lock when it was
> > replicated to a hot standby.

I can attest to one production outage caused by this behaviour of autovacuum.
The truncate operation performed by autovacuum, when being replayed on the
replica, was blocked by a query. Any new queries on that relation were blocked
by replication.

> > When that outage happened it was from a VACUUM call in a cronjob rather
> > than autovacuum. I now run such VACUUMs with TRUNCATE false which avoids
> > the issue for these. However I've realized that autovacuum could cause this
> > as well.
> >
> > I believe the only way to disable this for autovacuum is by changing the
> > vacuum_truncate storage parameters on tables. (Ignoring the now removed
> > old_snapshot_threshold option).
>
> Yes, you can only do that table by table.

That is unfortunate. Although doing so provides a granular control over which
relations one would like to exclude from truncation, it may not always be
desirable; a DBA/sysadmin may want to prevent this problem system-wide.

Also, this not really scalable since it requires that a maintenance operation
regularly connect to every database and apply this setting to all the relations,
for the fear that there may be new objects somewhere in the cluster since last
maintenance, which may cause this problem. It would be error prone, too,
considering that the list of databases in a cluster may change over time. And
then there's the added burden of monitoring the status of this maintenance
operation to ensure it's running successfully every time.

Turning on a system-wide setting that disables autovacuum truncation may look
like a heavy hammer, but in certain situations this may be preferable to the
risk of causing outage in production systems. It may be preferable to let the
system consume disk space by not truncating the tables, as opposed to running
the risk of blocked queries. Disk is cheap, and is possibly already being
monitored in a production system.

I understand Jeremy's contention upthread against adding such a feature at
global level, but I'm in favor of adding this feature since it prevents a sudden
and unpredictable impact on production systems, and instead leads to a gradual
escalation of the problem that can be monitored and addressed by a sysadmin/DBA
at a time that's convenient for them.

> > I am also wondering if having an autovacuum setting to control it would be
> > a good idea for a feature.
>
> I'm all for that.

Please see attached an initial patch to disable truncation behaviour in
autovacuum. This patch retains the default behavior of autovacuum truncating
relations. The user is allowed to change the behaviour and disable relation
truncations system-wide by setting autovacuum_disable_vacuum_truncate = true.
Better parameter names welcome :-)

One additional improvement I can think of is to emit a WARNING or NOTICE message
that truncate operation is being skipped, perhaps only if the truncation
would've freed up space over a certain threshold.

Perhaps there's value in letting this parameter be specified at database level,
but I'm not able to think of a reason why someone would want to disable this
behaviour on just one database. So leaving the parameter context to be the same
as most other autovacuum parameters: SIGHUP.

Best regards,
Gurjeet
http://Gurje.et
commit 84e8eebb87bc2c58feae847efd995bc055701688
Author: Gurjeet Singh <gurjeet@singh.im>
Date:   Thu Jan 23 19:37:33 2025 -0800

    Version 1

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 0ab921a169..17e22c08e5 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -115,6 +115,7 @@
  * GUC parameters
  */
 bool		autovacuum_start_daemon = false;
+bool		autovacuum_disable_vacuum_truncate = false;
 int			autovacuum_worker_slots;
 int			autovacuum_max_workers;
 int			autovacuum_work_mem = -1;
@@ -2811,12 +2812,16 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			(!wraparound ? VACOPT_SKIP_LOCKED : 0);
 
 		/*
-		 * index_cleanup and truncate are unspecified at first in autovacuum.
-		 * They will be filled in with usable values using their reloptions
-		 * (or reloption defaults) later.
+		 * index_cleanup is unspecified at first in autovacuum. truncate is
+		 * unspecified, unless it is disabled via the GUC parameter.
+		 *
+		 * The unspecified options will be filled in with usable values using
+		 * their reloptions (or reloption defaults) later.
 		 */
 		tab->at_params.index_cleanup = VACOPTVALUE_UNSPECIFIED;
-		tab->at_params.truncate = VACOPTVALUE_UNSPECIFIED;
+		tab->at_params.truncate = autovacuum_disable_vacuum_truncate
+									? VACOPTVALUE_DISABLED
+									: VACOPTVALUE_UNSPECIFIED;
 		/* As of now, we don't support parallel vacuum for autovacuum */
 		tab->at_params.nworkers = -1;
 		tab->at_params.freeze_min_age = freeze_min_age;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 38cb9e970d..40368e339c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1509,6 +1509,16 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"autovacuum_disable_vacuum_truncate", PGC_SIGHUP, VACUUM_AUTOVACUUM,
+			gettext_noop("Disables autovacuum behavior of truncatiing relations."),
+			NULL
+		},
+		&autovacuum_disable_vacuum_truncate,
+		false,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"trace_notify", PGC_USERSET, DEVELOPER_OPTIONS,
 			gettext_noop("Generates debugging output for LISTEN and NOTIFY."),
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index 54e01c81d6..c7597407d2 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -28,6 +28,7 @@ typedef enum
 
 /* GUC variables */
 extern PGDLLIMPORT bool autovacuum_start_daemon;
+extern PGDLLIMPORT bool autovacuum_disable_vacuum_truncate;
 extern PGDLLIMPORT int autovacuum_worker_slots;
 extern PGDLLIMPORT int autovacuum_max_workers;
 extern PGDLLIMPORT int autovacuum_work_mem;

Reply via email to