On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan <p...@bowt.ie> wrote: > Looks good. I'll give this version a review now. I will do a lot more > soon. I need to come up with a good benchmark for this, that I can > return to again and again during review as needed.
I performed another benchmark, similar to the last one but with the latest version (v2), and over a much longer period. Attached is a summary of the whole benchmark, and log_autovacuum output from the logs of both the master branch and the patch. This was pgbench scale 2000, 4 indexes on pgbench_accounts, and a transaction with one update and two selects. Each run was 4 hours, and we alternate between patch and master for each run, and alternate between 16 and 32 clients. There were 8 4 hour runs in total, meaning the entire set of runs took 8 * 4 hours = 32 hours (not including initial load time and a few other small things like that). I used a 10k TPS rate limit, so TPS isn't interesting here. Latency is interesting -- we see a nice improvement in latency (i.e. a reduction) for the patch (see all.summary.out). The benefits of the patch are clearly visible when I drill down and look at the details. Each pgbench_accounts autovacuum VACUUM operation can finish faster with the patch because they can often skip at least some indexes (usually the PK, sometimes 3 out of 4 indexes total). But it's more subtle than some might assume. We're skipping indexes that VACUUM actually would have deleted *some* index tuples from, which is very good. Bottom-up index deletion is usually lazy, and only occasionally very eager, so you still have plenty of "floating garbage" index tuples in most pages. And now we see VACUUM behave a little more like bottom-up index deletion -- it is lazy when that is appropriate (with indexes that really only have floating garbage that is spread diffusely throughout the index structure), and eager when that is appropriate (with indexes that get much more garbage). The benefit is not really that we're avoiding doing I/O for index vacuuming (though that is one of the smaller benefits here). The real benefit is that VACUUM is not dirtying pages, since it skips indexes when it would be "premature" to vacuum them from an efficiency point of view. This is important because we know that Postgres throughput is very often limited by page cleaning. Also, the "economics" of this new behavior make perfect sense -- obviously it's more efficient to delay garbage cleanup until the point when the same page will be modified by a backend anyway -- in the case of this benchmark via bottom-up index deletion (which deletes all garbage tuples in the leaf page at the point that it runs for a subset of pointed-to heap pages -- it's not using an oldestXmin cutoff from 30 minutes ago). So whenever we dirty a page, we now get more value per additional-page-dirtied. I believe that controlling the number of pages dirtied by VACUUM is usually much more important than reducing the amount of read I/O from VACUUM, for reasons I go into on the recent "vacuum_cost_page_miss default value and modern hardware" thread. As a further consequence of all this, VACUUM can go faster safely and sustainably (since the cost limit is not affected so much by vacuum_cost_page_miss), which has its own benefits (e.g. oldestXmin cutoff doesn't get so old towards the end). Another closely related huge improvement that we see here is that the number of FPIs generated by VACUUM can be significantly reduced. This cost is closely related to the cost of dirtying pages, but it's worth mentioning separately. You'll see some of that in the log_autovacuum log output I attached. There is an archive with much more detailed information, including dumps from most pg_stat_* views at key intervals. This has way more information than anybody is likely to want: https://drive.google.com/file/d/1OTiErELKRZmYnuJuczO2Tfcm1-cBYITd/view?usp=sharing I did notice a problem, though. I now think that the criteria for skipping an index vacuum in the third patch from the series is too conservative, and that this led to an excessive number of index vacuums with the patch. This is probably because there was a tiny number of page splits in some of the indexes that were not really supposed to grow. I believe that this is caused by ANALYZE running -- I think that it prevented bottom-up deletion from keeping a few of the hottest pages from splitting (that can take 5 or 6 seconds) at a few points over the 32 hour run. For example, the index named "tenner" grew by 9 blocks, starting out at 230,701 and ending up at 230,710 (to see this, extract the files from the archive and "diff patch.r1c16.initial_pg_relation_size.out patch.r2c32.after_pg_relation_size.out"). I now think that 0 blocks added is unnecessarily restrictive -- a small tolerance still seems like a good idea, though (let's still be somewhat conservative about it). Maybe a better criteria would be for nbtree to always proceed with index vacuuming when the index size is less than 2048 blocks (16MiB with 8KiB BLCKSZ). If an index is larger than that, then compare the last/old block count to the current block count (at the point that we decide if index vacuuming is going to go ahead) by rounding up both values to the next highest 2048 block increment. This formula is pretty arbitrary, but probably works as well as most others. It's a good iteration for the next version of the patch/further testing, at least. BTW, it would be nice if there was more instrumentation, say in the log output produced when log_autovacuum is on. That would make it easier to run these benchmarks -- I could verify my understanding of the work done for each particular av operation represented in the log. Though the default log_autovacuum log output is quite informative, it would be nice if the specifics were more obvious (maybe this could just be for the review/testing, but it might become something for users if it seems useful). -- Peter Geoghegan
p 184875/2021-01-20 03:20:02 PST LOG: automatic vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 4 skipped due to pins, 135904 skipped frozen tuples: 489197 removed, 200708461 remain, 708441 are dead but not yet removable, oldest xmin: 100063379 buffer usage: 7844412 hits, 3394356 misses, 3840063 dirtied avg read rate: 16.395 MB/s, avg write rate: 18.547 MB/s system usage: CPU: user: 228.03 s, system: 109.38 s, elapsed: 1617.51 s WAL usage: 7997485 records, 3847296 full page images, 28974392927 bytes -- p 192246/2021-01-20 10:21:10 PST LOG: automatic aggressive vacuum to prevent wraparound of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 75160 skipped frozen tuples: 478321 removed, 200719664 remain, 719664 are dead but not yet removable, oldest xmin: 200051691 buffer usage: 7842156 hits, 3596103 misses, 4021713 dirtied avg read rate: 16.007 MB/s, avg write rate: 17.901 MB/s system usage: CPU: user: 268.91 s, system: 152.14 s, elapsed: 1755.14 s WAL usage: 8355914 records, 4197453 full page images, 31439961509 bytes -- p 202988/2021-01-20 17:04:32 PST LOG: automatic aggressive vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 72333 skipped frozen tuples: 461652 removed, 200740441 remain, 740488 are dead but not yet removable, oldest xmin: 303318755 buffer usage: 2784423 hits, 2184110 misses, 2140091 dirtied avg read rate: 14.719 MB/s, avg write rate: 14.423 MB/s system usage: CPU: user: 253.33 s, system: 155.56 s, elapsed: 1159.26 s WAL usage: 4123147 records, 2025541 full page images, 15054982327 bytes -- p 203596/2021-01-20 17:34:08 PST LOG: automatic vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 110839 skipped frozen tuples: 289983 removed, 200663603 remain, 663603 are dead but not yet removable, oldest xmin: 318283196 buffer usage: 7770830 hits, 3679088 misses, 3178761 dirtied avg read rate: 19.971 MB/s, avg write rate: 17.255 MB/s system usage: CPU: user: 197.36 s, system: 65.85 s, elapsed: 1439.21 s WAL usage: 7559115 records, 3744079 full page images, 27971870881 bytes -- p 215108/2021-01-21 00:44:08 PST LOG: automatic aggressive vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 114919 skipped frozen tuples: 248566 removed, 200666743 remain, 666743 are dead but not yet removable, oldest xmin: 432034864 buffer usage: 7755512 hits, 3717941 misses, 3161927 dirtied avg read rate: 20.574 MB/s, avg write rate: 17.497 MB/s system usage: CPU: user: 182.69 s, system: 55.97 s, elapsed: 1411.82 s WAL usage: 7357854 records, 3134635 full page images, 23391701173 bytes -- p 219344/2021-01-21 03:31:49 PST LOG: automatic aggressive vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 74044 skipped frozen tuples: 474704 removed, 200727039 remain, 727039 are dead but not yet removable, oldest xmin: 535373084 buffer usage: 3060017 hits, 2310541 misses, 2117761 dirtied avg read rate: 15.810 MB/s, avg write rate: 14.491 MB/s system usage: CPU: user: 265.96 s, system: 144.36 s, elapsed: 1141.72 s WAL usage: 4430826 records, 2313365 full page images, 17243237934 bytes -- p 219821/2021-01-21 03:58:18 PST LOG: automatic vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 2 skipped due to pins, 117800 skipped frozen tuples: 254462 removed, 200672271 remain, 672271 are dead but not yet removable, oldest xmin: 546920768 buffer usage: 7755270 hits, 3750754 misses, 3961656 dirtied avg read rate: 18.595 MB/s, avg write rate: 19.640 MB/s system usage: CPU: user: 189.02 s, system: 57.89 s, elapsed: 1575.87 s WAL usage: 7323381 records, 3382354 full page images, 25238367511 bytes
all.summary.out
Description: Binary data
p 189716/2021-01-20 07:32:03 PST LOG: automatic vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 1 skipped due to pins, 134432 skipped frozen tuples: 490062 removed, 200707582 remain, 707582 are dead but not yet removable, oldest xmin: 100303792 buffer usage: 7845277 hits, 3396911 misses, 3744219 dirtied avg read rate: 16.509 MB/s, avg write rate: 18.196 MB/s system usage: CPU: user: 218.84 s, system: 113.03 s, elapsed: 1607.55 s WAL usage: 8004099 records, 3845516 full page images, 28948566363 bytes -- p 198069/2021-01-20 14:21:37 PST LOG: automatic aggressive vacuum to prevent wraparound of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 1 skipped due to pins, 75641 skipped frozen tuples: 478450 removed, 200719755 remain, 719755 are dead but not yet removable, oldest xmin: 200059579 buffer usage: 7840952 hits, 3595622 misses, 4028992 dirtied avg read rate: 16.005 MB/s, avg write rate: 17.934 MB/s system usage: CPU: user: 258.48 s, system: 149.45 s, elapsed: 1755.13 s WAL usage: 8354445 records, 4198046 full page images, 31437743453 bytes -- p 209408/2021-01-20 21:17:47 PST LOG: automatic aggressive vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 72341 skipped frozen tuples: 466631 removed, 200734999 remain, 734999 are dead but not yet removable, oldest xmin: 303691852 buffer usage: 7817219 hits, 3673397 misses, 4304438 dirtied avg read rate: 15.171 MB/s, avg write rate: 17.777 MB/s system usage: CPU: user: 296.09 s, system: 170.50 s, elapsed: 1891.68 s WAL usage: 8375272 records, 4116807 full page images, 30840556321 bytes -- p 213656/2021-01-21 00:10:26 PST LOG: automatic aggressive vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 72029 skipped frozen tuples: 463713 removed, 200738875 remain, 738875 are dead but not yet removable, oldest xmin: 407452635 buffer usage: 7815166 hits, 3724003 misses, 4075397 dirtied avg read rate: 15.525 MB/s, avg write rate: 16.990 MB/s system usage: CPU: user: 308.18 s, system: 175.94 s, elapsed: 1873.95 s WAL usage: 8397426 records, 4636312 full page images, 34906648661 bytes -- p 224806/2021-01-21 07:03:56 PST LOG: automatic aggressive vacuum of table "regression.public.pgbench_accounts": index scans: 1 pages: 0 removed, 3636364 remain, 0 skipped due to pins, 75220 skipped frozen tuples: 474105 removed, 200730056 remain, 730056 are dead but not yet removable, oldest xmin: 511461928 buffer usage: 7896551 hits, 3758022 misses, 4095331 dirtied avg read rate: 15.975 MB/s, avg write rate: 17.409 MB/s system usage: CPU: user: 289.15 s, system: 162.55 s, elapsed: 1837.82 s WAL usage: 8462689 records, 4118338 full page images, 30676691034 bytes