Re: New GUC autovacuum_max_threshold ?

Frédéric Yhuel Fri, 26 Apr 2024 01:10:35 -0700



Le 25/04/2024 à 22:21, Robert Haas a écrit :

The analyze case, I feel, is really murky.
autovacuum_analyze_scale_factor stands for the proposition that as the
table becomes larger, analyze doesn't need to be done as often. If
what you're concerned about is the frequency estimates, that's true:
an injection of a million new rows can shift frequencies dramatically
in a small table, but the effect is blunted in a large one. But a lot
of the cases I've seen have involved the histogram boundaries. If
you're inserting data into a table in increasing order, every new
million rows shifts the boundary of the last histogram bucket by the
same amount. You either need those rows included in the histogram to
get good query plans, or you don't. If you do, the frequency with
which you need to analyze does not change as the table grows. If you
don't, then it probably does. But the answer doesn't really depend on
how big the table is already, but on your workload. So it's unclear to
me that the proposed parameter is the right idea here at all. It's
also unclear to me that the existing system is the right idea. 🙂

This is very interesting. And what about ndistinct? I believe it couldbe problematic, too, in some (admittedly rare or pathological) cases.

For example, suppose that the actual number of distinct values growsfrom 1000 to 200000 after a batch of insertions, for a particularcolumn. OK, in such a case, the default analyze sampling isn't largeenough to compute a ndistinct close enough to reality anyway. Butwithout any analyze at all, it can lead to very bad planning - think ofa Nested Loop with a parallel seq scan for the outer table instead of asimple efficient index scan, because the index scan of the inner tableis overestimated (each index scan cost and number or rows returned).

Re: New GUC autovacuum_max_threshold ?

Reply via email to