Option to not use ringbuffer in VACUUM, using it in failsafe mode

Andres Freund Wed, 11 Jan 2023 10:27:47 -0800

Hi,

The use of the ringbuffer in VACUUM often causes very substantial slowdowns.


The primary reason for that is that often most/all the buffers in the
ringbuffer have been dirtied when they were processed, with an associated WAL
record. When we then reuse the buffer via the (quite small) ringbuffer, we
need to flush the WAL before reclaiming the buffer.  A synchronous flush of
the WAL every few buffers ends up as a very significant bottleneck, unless you
have local low-latency durable storage (i.e. local storage with some sort of
power protection).

The slowdown caused by the frequent WAL flushes is very significant.

A secondary issue, when we end up doing multiple passes, we'll have to re-read
data into shared_buffers, when we just wrote it out and evicted it.


An example:

On a local SSD with decently fast fdatasync([1]). Table size is 3322MB, with 
~10%
updated, ~30% deleted tuples, and a single index. m_w_m is large enough to do
this in one pass.  I used pg_prewarm() of another relation to ensure the
vacuumed table isn't in s_b (otherwise ringbuffers aren't doing anything).

s_b     ringbuffer enabled      time     wal_syncs wal_sync_time
128MB   1                       77797ms
128MB   0                       13676ms       241         2538ms
8GB     1                       72834ms     23976        51989ms
8GB     0                        9544ms       150         1634ms


see [2] for logs / stats of the 8GB run. All the data here is in the OS page
cache, so we don't even pay the real-price for reading the data multiple
times.


On cloud hardware with higher fsync latency I've seen > 15x time differences
between using the ringbuffers and avoiding them by using pg_prewarm.


Of course there's a good reason we have the ringbuffer - we don't want
maintenance operations to completely disturb the buffer pool and harm the
production workload. But if, e.g., the database isn't available due to
anti-wraparound measures, there's no other workload to protect, and the
ringbuffer substantially reduces availability. Initial data loads could
similarly benefit.


Therefore I'd like to add an option to the VACUUM command to use to disable
the use of the ringbuffer. Not sure about the name yet.


I think we should auto-enable that mode once we're using the failsafe mode,
similar to [auto]vacuum cost delays getting disabled
(c.f. lazy_check_wraparound_failsafe()). If things are bad enough that we're
soon going to shut down, we want to be aggressive.


Greetings,

Andres Freund

[1] according to pg_test_fsync:
fdatasync                           769.189 ops/sec    1300 usecs/op


[2]

For the s_b 128MB case:

ringbuffers enabled:

2023-01-11 10:24:58.726 PST [355353][client backend][2/19:0][psql] INFO:  
aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:26:19.488 PST [355353][client backend][2/19:0][psql] INFO:  
finished vacuuming "postgres.public.copytest_0": index scans: 1
        pages: 0 removed, 424975 remain, 424975 scanned (100.00% of total)
        tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet 
removable
        removable cutoff: 2981, which was 0 XIDs old when operation ended
        new relfrozenxid: 2981, which is 102 XIDs ahead of previous value
        frozen: 424975 pages from table (100.00% of total) had 6666700 tuples 
frozen
        index scan needed: 424975 pages from table (100.00% of total) had 
4325101 dead item identifiers removed
        index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 
currently deleted, 0 reusable
        I/O timings: read: 2284.292 ms, write: 4325.009 ms
        avg read rate: 83.203 MB/s, avg write rate: 83.199 MB/s
        buffer usage: 425044 hits, 860113 misses, 860074 dirtied
        WAL usage: 1709902 records, 434990 full page images, 2273501683 bytes
        system usage: CPU: user: 11.62 s, system: 11.86 s, elapsed: 80.76 s

┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes  │ wal_buffers_full │ wal_write │ wal_sync 
│ wal_write_time │ wal_sync_time │          stats_reset          │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│     8092795 │  443356 │ 2999358740 │             1569 │     28651 │    27081 
│       1874.391 │     59895.674 │ 2023-01-11 10:24:58.664859-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘


ringbuffers disabled:

2023-01-11 10:23:05.081 PST [355054][client backend][2/19:0][psql] INFO:  
aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:23:18.755 PST [355054][client backend][2/19:0][psql] INFO:  
finished vacuuming "postgres.public.copytest_0": index scans: 1
        pages: 0 removed, 424979 remain, 424979 scanned (100.00% of total)
        tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet 
removable
        removable cutoff: 2879, which was 0 XIDs old when operation ended
        new relfrozenxid: 2879, which is 102 XIDs ahead of previous value
        frozen: 424979 pages from table (100.00% of total) had 6666700 tuples 
frozen
        index scan needed: 424979 pages from table (100.00% of total) had 
4325176 dead item identifiers removed
        index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 
currently deleted, 0 reusable
        I/O timings: read: 1247.366 ms, write: 2888.756 ms
        avg read rate: 491.485 MB/s, avg write rate: 491.395 MB/s
        buffer usage: 424927 hits, 860242 misses, 860083 dirtied
        WAL usage: 1709918 records, 434994 full page images, 2273503049 bytes
        system usage: CPU: user: 5.42 s, system: 6.26 s, elapsed: 13.67 s

┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes  │ wal_buffers_full │ wal_write │ wal_sync 
│ wal_write_time │ wal_sync_time │          stats_reset          │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│     8092963 │  443362 │ 2999373996 │           212190 │    212333 │      241 
│       1209.516 │      2538.706 │ 2023-01-11 10:23:05.004783-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘



For the s_b 8GB case:

ringbuffers enabled:

2023-01-11 10:04:12.479 PST [352665][client backend][2/19:0][psql] INFO:  
aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:05:25.312 PST [352665][client backend][2/19:0][psql] INFO:  
finished vacuuming "postgres.public.copytest_0": index scans: 1
        pages: 0 removed, 424977 remain, 424977 scanned (100.00% of total)
        tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet 
removable
        removable cutoff: 2675, which was 0 XIDs old when operation ended
        new relfrozenxid: 2675, which is 102 XIDs ahead of previous value
        frozen: 424977 pages from table (100.00% of total) had 6666700 tuples 
frozen
        index scan needed: 424977 pages from table (100.00% of total) had 
4325066 dead item identifiers removed
        index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 
currently deleted, 0 reusable
        I/O timings: read: 2610.875 ms, write: 4177.842 ms
        avg read rate: 81.688 MB/s, avg write rate: 87.515 MB/s
        buffer usage: 523611 hits, 761552 misses, 815868 dirtied
        WAL usage: 1709910 records, 434992 full page images, 2273502729 bytes
        system usage: CPU: user: 11.00 s, system: 11.86 s, elapsed: 72.83 s
┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes  │ wal_buffers_full │ wal_write │ wal_sync 
│ wal_write_time │ wal_sync_time │          stats_reset          │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│     8092707 │  443358 │ 2999354090 │            42259 │     66227 │    23976 
│       2050.963 │     51989.099 │ 2023-01-11 10:04:12.404054-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘


ringbuffers disabled:

2023-01-11 10:08:48.414 PST [353287][client backend][3/19:0][psql] INFO:  
aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:08:57.956 PST [353287][client backend][3/19:0][psql] INFO:  
finished vacuuming "postgres.public.copytest_0": index scans: 1
        pages: 0 removed, 424977 remain, 424977 scanned (100.00% of total)
        tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet 
removable
        removable cutoff: 2777, which was 0 XIDs old when operation ended
        new relfrozenxid: 2777, which is 102 XIDs ahead of previous value
        frozen: 424977 pages from table (100.00% of total) had 6666700 tuples 
frozen
        index scan needed: 424976 pages from table (100.00% of total) had 
4325153 dead item identifiers removed
        index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 
currently deleted, 0 reusable
        I/O timings: read: 1040.230 ms, write: 0.000 ms
        avg read rate: 312.016 MB/s, avg write rate: 356.242 MB/s
        buffer usage: 904078 hits, 381084 misses, 435101 dirtied
        WAL usage: 1709908 records, 434992 full page images, 2273499663 bytes
        system usage: CPU: user: 5.57 s, system: 2.26 s, elapsed: 9.54 s

┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes  │ wal_buffers_full │ wal_write │ wal_sync 
│ wal_write_time │ wal_sync_time │          stats_reset          │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│     8092933 │  443358 │ 2999364596 │           236354 │    236398 │      150 
│       1166.314 │      1634.408 │ 2023-01-11 10:08:48.350328-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘

Option to not use ringbuffer in VACUUM, using it in failsafe mode

Reply via email to