Trigger more frequent autovacuums of heavy insert tables

Melanie Plageman Tue, 22 Oct 2024 12:22:29 -0700

Hi,

Because of the way autovacuum_vacuum_[insert]_scale_factor works,
autovacuums trigger less frequently as the relation gets larger.


See this math in relation_needs_vacanalyze:

vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;

For an insert-only table, nearly all the unvacuumed pages will be
eligible to be set all-visible and many will be eligible to be set
all-frozen.

Because normal vacuums can skip all-visible pages, proactively setting
these pages all-visible by vacuuming them sooner often reduces IO
overhead, as they are more likely to still be in shared buffers the
sooner they are vacuumed after last being touched.

Vacuuming these pages more proactively and setting them frozen also
helps amortize the work of aggressive vacuums -- which often
negatively impact the performance of the system.

By considering only the unfrozen portion of the table when calculating
the vacuum insert threshold, we can trigger vacuums more proactively
on insert-heavy tables. This changes the definition of
insert_scale_factor to a percentage of "active" table size. The
attached patch does this.

I've estimated the unfrozen percentage of the table by adding a new
field to pg_class, relallfrozen, which is updated in the same places
as relallvisible.

As an example of this patch in action, I designed a benchmark in which
a table is bulk-loaded with 1 GB of data with COPY FREEZE. Then I run
a custom pgbench with four clients inserting 10 tuples per transaction
into the table for 1_000_000 transactions each.

Note that I configured Postgres to try and observe the effects of this
patch on a compressed timeline. At the bottom of the mail, I've
included details on all of the GUCs I set and why.

Over the course of the same number of transactions, master triggered 8
autovacuums of the table and the patch triggered 16.

With the patch, despite doing twice as many vacuums, autovacuum
workers did 10% fewer reads and 93% fewer writes.

At the end of the benchmark, the patched version of Postgres had
emitted twice as many FPIs as master.

More frequent vacuums means each vacuum scans fewer pages, but, more
interestingly, the first vacuum after a checkpoint is much more
efficient. With the patch, the first vacuum after a checkpoint emits
half as many FPIs. You can see that only 18 pages were newly dirtied.
So, with the patch, the pages being vacuumed are usually still in
shared buffers and still dirty.

Master
------
2024-10-22 13:53:14.293 EDT [3594] LOG:  checkpoint starting: time
2024-10-22 13:53:27.849 EDT [3964] LOG:  automatic vacuum of table "history"
pages: 0 removed, 753901 remain, 151589 scanned (20.11% of total)
I/O timings: read: 77.962 ms, write: 92.879 ms
avg read rate: 95.840 MB/s, avg write rate: 96.852 MB/s
buffer usage: 268318 hits, 35133 reads, 35504 dirtied
WAL usage: 218320 records, 98672 full page images, 71314906 bytes

Patch
-----
2024-10-22 13:48:43.951 EDT [1471] LOG:  checkpoint starting: time
2024-10-22 13:48:59.741 EDT [1802] LOG:  automatic vacuum of table "history"
pages: 0 removed, 774375 remain, 121367 scanned (15.67% of total)
I/O timings: read: 2.974 ms, write: 4.434 ms
avg read rate: 1.363 MB/s, avg write rate: 0.126 MB/s
buffer usage: 242817 hits, 195 reads, 18 dirtied
WAL usage: 121389 records, 49216 full page images, 34408291 bytes

While it is true that timing will change significantly from run to
run, I observed over many runs that the more frequent vacuums of the
table led to less overall overhead due to vacuuming pages before they
are evicted from shared buffers.

Below is a detailed description of the benchmark and Postgres configuration:

Benchmark
=========

Set these GUCs:

-- initial table data should fill shared buffers
shared_buffers=1GB
-- give us a chance to try and vacuum the table a bunch of times
autovacuum_naptime=2

-- all checkpoints should be triggered by timing
max/min_wal_size=150GB
-- let's get at least 1 checkpoint during the short benchmark
checkpoint_timeout='2min'

-- let's not be bottlenecked on WAL I/O
wal_buffers='128MB'
wal_compression='zstd'

-- let's get a lot of inserts done quickly
synchronous_commit='off'

-- let's not take too many breaks for vacuum delay
vacuum_cost_limit = 2000

-- so we can see what happened
log_checkpoints = on
log_autovacuum_min_duration=0

-- so we can get more stats
track_wal_io_timing=on
track_io_timing = on

First I created the table that you will see later in DDL and loaded it
by running pgbench in the same way as I do in the benchmark until
there was 1 GB of table data. Then I copied that out to a file
'history.data'

I included an index because the more up-to-date visibility map would
benefit index-only scans -- which you could add to the benchmark if
you want.

DDL
--
BEGIN;
DROP TABLE IF EXISTS history;
CREATE TABLE history(
    id BIGINT,
    client_id INT NOT NULL,
    mtime TIMESTAMPTZ DEFAULT NOW(),
    data TEXT);

COPY history FROM 'history.data' WITH (freeze on);
CREATE INDEX ON history(id);
COMMIT;

pgbench \
  --random-seed=0 \
  --no-vacuum \
  -M prepared \
  -c 4 \
  -j 4 \
  -t 1000000 \
  -R 27000 \
  -f- <<EOF
    INSERT INTO history(id, client_id, data)
        VALUES
        (:client_id, :client_id, repeat('a', 90)),
        (:client_id, :client_id, repeat('b', 90)),
        (:client_id, :client_id, repeat('c', 90)),
        (:client_id, :client_id, repeat('d', 90)),
        (:client_id, :client_id, repeat('e', 90)),
        (:client_id, :client_id, repeat('f', 90)),
        (:client_id, :client_id, repeat('g', 90)),
        (:client_id, :client_id, repeat('h', 90)),
        (:client_id, :client_id, repeat('i', 90)),
        (:client_id, :client_id, repeat('j', 90));
EOF

- Melanie

v1-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patch
Description: Binary data

v1-0001-Add-relallfrozen-to-pg_class.patch
Description: Binary data

Trigger more frequent autovacuums of heavy insert tables

Reply via email to