On Wed, 11 Mar 2020 at 19:00, Masahiko Sawada <masahiko.saw...@2ndquadrant.com> wrote: > > On Wed, 11 Mar 2020 at 13:24, Laurenz Albe <laurenz.a...@cybertec.at> wrote: > > If you have an insert-only table that has 100000 entries, it will get > > vacuumed roughly every 20000 new entries. The impact is probably too > > little to care, but it will increase the contention for the three > > autovacuum workers available by default. > > The same is true for read-write table, right? If that becomes a > problem, it's a mis-configuration and user should increase these > values just like when we set these values for read-write tables.
It is true that if vacuum has more to do than it can do, then something is not configured correctly. I imagine Laurenz set the scale factor to 0.0 and the threshold to 10 million to reduce the chances that someone will encounter that problem. I mentioned somewhere upthread that commonly used to see production servers running with the standard vacuum_cost_limit of 200 and the (pre-PG12) autovacuum_vacuum_cost_delay of 20. Generally, it didn't go well for them. autovacuum_vacuum_cost_delay is now 2 by default, so auto-vacuum in PG12 and beyond runs 10x faster, but it's still pretty conservative and it'll still need another bump in several years when hardware is faster than it is today. So, by no means did that 10x increase mean that nobody will suffer from auto-vacuum starvation ever again. Now, perhaps it remains to be seen if adding additional work onto auto-vacuum will help or hinder those people. If their auto-vacuum can just keep up until the cluster is old enough to need anti-wraparound vacuums and then falls massively behind, then perhaps this is a good thing as they might notice at some point before their server explodes in the middle of the night. By that time they might have become complacent. Additionally, I think this is pretty well aligned to the case mentioned in the subject line of this email. We now have a freeze map, so performing vacuums to freeze tuples twice as often is not really much more expensive in total than doing that vacuuming half as often. Even tables (e.g log tables) that are never queried won't become much more costly to maintain. In the meantime, for tables that do receive queries, then we're more likely to get an index-only scan. Perhaps a good way to decide what the scale_factor should be set to should depend on the run-time of an Index Only Scan, vs an Index Scan. create table ios (a int, b text); insert into ios select x,x::text from generate_series(1,1000000)x; create index on ios (a); vacuum analyze ios; explain (analyze, buffers) select a from ios order by a; -- on 2nd exec QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using ios_a_idx on ios (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.035..212.602 rows=1000000 loops=1) Heap Fetches: 0 Buffers: shared hit=2736 Planning Time: 0.095 ms Execution Time: 246.864 ms (5 rows) set enable_indexonlyscan=0; explain (analyze, buffers) select a from ios order by a; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Index Scan using ios_a_idx on ios (cost=0.42..31388.42 rows=1000000 width=4) (actual time=0.036..451.381 rows=1000000 loops=1) Buffers: shared hit=8140 Planning Time: 0.089 ms Execution Time: 486.582 ms (4 rows) So about twice as fast with the IOS. When it's going to be beneficial to perform the vacuum will depend on the reads to insert ratio. I'm starting to think that we should set the scale_factor to something like 0.3 and the threshold to 50. Is anyone strongly against that? Or Laurenz, are you really set on the 10 million threshold?