Re: New IndexAM API controlling index vacuum strategies

Peter Geoghegan Thu, 21 Jan 2021 21:01:06 -0800

On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan <p...@bowt.ie> wrote:
> Looks good. I'll give this version a review now. I will do a lot more
> soon. I need to come up with a good benchmark for this, that I can
> return to again and again during review as needed.


I performed another benchmark, similar to the last one but with the
latest version (v2), and over a much longer period. Attached is a
summary of the whole benchmark, and log_autovacuum output from the
logs of both the master branch and the patch.

This was pgbench scale 2000, 4 indexes on pgbench_accounts, and a
transaction with one update and two selects. Each run was 4 hours, and
we alternate between patch and master for each run, and alternate
between 16 and 32 clients. There were 8 4 hour runs in total, meaning
the entire set of runs took 8 * 4 hours = 32 hours (not including
initial load time and a few other small things like that). I used a
10k TPS rate limit, so TPS isn't interesting here. Latency is
interesting -- we see a nice improvement in latency (i.e. a reduction)
for the patch (see all.summary.out).

The benefits of the patch are clearly visible when I drill down and
look at the details. Each pgbench_accounts autovacuum VACUUM operation
can finish faster with the patch because they can often skip at least
some indexes (usually the PK, sometimes 3 out of 4 indexes total). But
it's more subtle than some might assume. We're skipping indexes that
VACUUM actually would have deleted *some* index tuples from, which is
very good. Bottom-up index deletion is usually lazy, and only
occasionally very eager, so you still have plenty of "floating
garbage" index tuples in most pages. And now we see VACUUM behave a
little more like bottom-up index deletion -- it is lazy when that is
appropriate (with indexes that really only have floating garbage that
is spread diffusely throughout the index structure), and eager when
that is appropriate (with indexes that get much more garbage).

The benefit is not really that we're avoiding doing I/O for index
vacuuming (though that is one of the smaller benefits here). The real
benefit is that VACUUM is not dirtying pages, since it skips indexes
when it would be "premature" to vacuum them from an efficiency point
of view. This is important because we know that Postgres throughput is
very often limited by page cleaning. Also, the "economics" of this new
behavior make perfect sense -- obviously it's more efficient to delay
garbage cleanup until the point when the same page will be modified by
a backend anyway -- in the case of this benchmark via bottom-up index
deletion (which deletes all garbage tuples in the leaf page at the
point that it runs for a subset of pointed-to heap pages -- it's not
using an oldestXmin cutoff from 30 minutes ago). So whenever we dirty
a page, we now get more value per additional-page-dirtied.

I believe that controlling the number of pages dirtied by VACUUM is
usually much more important than reducing the amount of read I/O from
VACUUM, for reasons I go into on the recent "vacuum_cost_page_miss
default value and modern hardware" thread. As a further consequence of
all this, VACUUM can go faster safely and sustainably (since the cost
limit is not affected so much by vacuum_cost_page_miss), which has its
own benefits (e.g. oldestXmin cutoff doesn't get so old towards the
end).

Another closely related huge improvement that we see here is that the
number of FPIs generated by VACUUM can be significantly reduced. This
cost is closely related to the cost of dirtying pages, but it's worth
mentioning separately. You'll see some of that in the log_autovacuum
log output I attached.

There is an archive with much more detailed information, including
dumps from most pg_stat_* views at key intervals. This has way more
information than anybody is likely to want:

https://drive.google.com/file/d/1OTiErELKRZmYnuJuczO2Tfcm1-cBYITd/view?usp=sharing

I did notice a problem, though. I now think that the criteria for
skipping an index vacuum in the third patch from the series is too
conservative, and that this led to an excessive number of index
vacuums with the patch. This is probably because there was a tiny
number of page splits in some of the indexes that were not really
supposed to grow. I believe that this is caused by ANALYZE running --
I think that it prevented bottom-up deletion from keeping a few of the
hottest pages from splitting (that can take 5 or 6 seconds) at a few
points over the 32 hour run. For example, the index named "tenner"
grew by 9 blocks, starting out at 230,701 and ending up at 230,710 (to
see this, extract the files from the archive and "diff
patch.r1c16.initial_pg_relation_size.out
patch.r2c32.after_pg_relation_size.out").

I now think that 0 blocks added is unnecessarily restrictive -- a
small tolerance still seems like a good idea, though (let's still be
somewhat conservative about it).

Maybe a better criteria would be for nbtree to always proceed with
index vacuuming when the index size is less than 2048 blocks (16MiB
with 8KiB BLCKSZ). If an index is larger than that, then compare the
last/old block count to the current block count (at the point that we
decide if index vacuuming is going to go ahead) by rounding up both
values to the next highest 2048 block increment. This formula is
pretty arbitrary, but probably works as well as most others. It's a
good iteration for the next version of the patch/further testing, at
least.

BTW, it would be nice if there was more instrumentation, say in the
log output produced when log_autovacuum is on. That would make it
easier to run these benchmarks -- I could verify my understanding of
the work done for each particular av operation represented in the log.
Though the default log_autovacuum log output is quite informative, it
would be nice if the specifics were more obvious (maybe this could
just be for the review/testing, but it might become something for
users if it seems useful).

-- 
Peter Geoghegan

p 184875/2021-01-20 03:20:02 PST LOG:  automatic vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 4 skipped due to pins, 135904 skipped 
frozen
        tuples: 489197 removed, 200708461 remain, 708441 are dead but not yet 
removable, oldest xmin: 100063379
        buffer usage: 7844412 hits, 3394356 misses, 3840063 dirtied
        avg read rate: 16.395 MB/s, avg write rate: 18.547 MB/s
        system usage: CPU: user: 228.03 s, system: 109.38 s, elapsed: 1617.51 s
        WAL usage: 7997485 records, 3847296 full page images, 28974392927 bytes
--
p 192246/2021-01-20 10:21:10 PST LOG:  automatic aggressive vacuum to prevent 
wraparound of table "regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 75160 skipped 
frozen
        tuples: 478321 removed, 200719664 remain, 719664 are dead but not yet 
removable, oldest xmin: 200051691
        buffer usage: 7842156 hits, 3596103 misses, 4021713 dirtied
        avg read rate: 16.007 MB/s, avg write rate: 17.901 MB/s
        system usage: CPU: user: 268.91 s, system: 152.14 s, elapsed: 1755.14 s
        WAL usage: 8355914 records, 4197453 full page images, 31439961509 bytes
--
p 202988/2021-01-20 17:04:32 PST LOG:  automatic aggressive vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 72333 skipped 
frozen
        tuples: 461652 removed, 200740441 remain, 740488 are dead but not yet 
removable, oldest xmin: 303318755
        buffer usage: 2784423 hits, 2184110 misses, 2140091 dirtied
        avg read rate: 14.719 MB/s, avg write rate: 14.423 MB/s
        system usage: CPU: user: 253.33 s, system: 155.56 s, elapsed: 1159.26 s
        WAL usage: 4123147 records, 2025541 full page images, 15054982327 bytes
--
p 203596/2021-01-20 17:34:08 PST LOG:  automatic vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 110839 skipped 
frozen
        tuples: 289983 removed, 200663603 remain, 663603 are dead but not yet 
removable, oldest xmin: 318283196
        buffer usage: 7770830 hits, 3679088 misses, 3178761 dirtied
        avg read rate: 19.971 MB/s, avg write rate: 17.255 MB/s
        system usage: CPU: user: 197.36 s, system: 65.85 s, elapsed: 1439.21 s
        WAL usage: 7559115 records, 3744079 full page images, 27971870881 bytes
--
p 215108/2021-01-21 00:44:08 PST LOG:  automatic aggressive vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 114919 skipped 
frozen
        tuples: 248566 removed, 200666743 remain, 666743 are dead but not yet 
removable, oldest xmin: 432034864
        buffer usage: 7755512 hits, 3717941 misses, 3161927 dirtied
        avg read rate: 20.574 MB/s, avg write rate: 17.497 MB/s
        system usage: CPU: user: 182.69 s, system: 55.97 s, elapsed: 1411.82 s
        WAL usage: 7357854 records, 3134635 full page images, 23391701173 bytes
--
p 219344/2021-01-21 03:31:49 PST LOG:  automatic aggressive vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 74044 skipped 
frozen
        tuples: 474704 removed, 200727039 remain, 727039 are dead but not yet 
removable, oldest xmin: 535373084
        buffer usage: 3060017 hits, 2310541 misses, 2117761 dirtied
        avg read rate: 15.810 MB/s, avg write rate: 14.491 MB/s
        system usage: CPU: user: 265.96 s, system: 144.36 s, elapsed: 1141.72 s
        WAL usage: 4430826 records, 2313365 full page images, 17243237934 bytes
--
p 219821/2021-01-21 03:58:18 PST LOG:  automatic vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 2 skipped due to pins, 117800 skipped 
frozen
        tuples: 254462 removed, 200672271 remain, 672271 are dead but not yet 
removable, oldest xmin: 546920768
        buffer usage: 7755270 hits, 3750754 misses, 3961656 dirtied
        avg read rate: 18.595 MB/s, avg write rate: 19.640 MB/s
        system usage: CPU: user: 189.02 s, system: 57.89 s, elapsed: 1575.87 s
        WAL usage: 7323381 records, 3382354 full page images, 25238367511 bytes

all.summary.out
Description: Binary data

p 189716/2021-01-20 07:32:03 PST LOG:  automatic vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 1 skipped due to pins, 134432 skipped 
frozen
        tuples: 490062 removed, 200707582 remain, 707582 are dead but not yet 
removable, oldest xmin: 100303792
        buffer usage: 7845277 hits, 3396911 misses, 3744219 dirtied
        avg read rate: 16.509 MB/s, avg write rate: 18.196 MB/s
        system usage: CPU: user: 218.84 s, system: 113.03 s, elapsed: 1607.55 s
        WAL usage: 8004099 records, 3845516 full page images, 28948566363 bytes
--
p 198069/2021-01-20 14:21:37 PST LOG:  automatic aggressive vacuum to prevent 
wraparound of table "regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 1 skipped due to pins, 75641 skipped 
frozen
        tuples: 478450 removed, 200719755 remain, 719755 are dead but not yet 
removable, oldest xmin: 200059579
        buffer usage: 7840952 hits, 3595622 misses, 4028992 dirtied
        avg read rate: 16.005 MB/s, avg write rate: 17.934 MB/s
        system usage: CPU: user: 258.48 s, system: 149.45 s, elapsed: 1755.13 s
        WAL usage: 8354445 records, 4198046 full page images, 31437743453 bytes
--
p 209408/2021-01-20 21:17:47 PST LOG:  automatic aggressive vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 72341 skipped 
frozen
        tuples: 466631 removed, 200734999 remain, 734999 are dead but not yet 
removable, oldest xmin: 303691852
        buffer usage: 7817219 hits, 3673397 misses, 4304438 dirtied
        avg read rate: 15.171 MB/s, avg write rate: 17.777 MB/s
        system usage: CPU: user: 296.09 s, system: 170.50 s, elapsed: 1891.68 s
        WAL usage: 8375272 records, 4116807 full page images, 30840556321 bytes
--
p 213656/2021-01-21 00:10:26 PST LOG:  automatic aggressive vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 72029 skipped 
frozen
        tuples: 463713 removed, 200738875 remain, 738875 are dead but not yet 
removable, oldest xmin: 407452635
        buffer usage: 7815166 hits, 3724003 misses, 4075397 dirtied
        avg read rate: 15.525 MB/s, avg write rate: 16.990 MB/s
        system usage: CPU: user: 308.18 s, system: 175.94 s, elapsed: 1873.95 s
        WAL usage: 8397426 records, 4636312 full page images, 34906648661 bytes
--
p 224806/2021-01-21 07:03:56 PST LOG:  automatic aggressive vacuum of table 
"regression.public.pgbench_accounts": index scans: 1
        pages: 0 removed, 3636364 remain, 0 skipped due to pins, 75220 skipped 
frozen
        tuples: 474105 removed, 200730056 remain, 730056 are dead but not yet 
removable, oldest xmin: 511461928
        buffer usage: 7896551 hits, 3758022 misses, 4095331 dirtied
        avg read rate: 15.975 MB/s, avg write rate: 17.409 MB/s
        system usage: CPU: user: 289.15 s, system: 162.55 s, elapsed: 1837.82 s
        WAL usage: 8462689 records, 4118338 full page images, 30676691034 bytes

Re: New IndexAM API controlling index vacuum strategies

Reply via email to