Hi, On 2023-01-19 15:10:38 -0800, Peter Geoghegan wrote: > On Thu, Jan 19, 2023 at 2:54 PM Andres Freund <and...@anarazel.de> wrote: > > Yea. Hence my musing about potentially addressing this by choosing to visit > > additional blocks during the heap scan using vacuum's block sampling logic. > > I'd rather just invent a way for vacuumlazy.c to tell the top-level > vacuum.c caller "I didn't update reltuples, but you ought to go > ANALYZE the table now that I'm done, even if you weren't already > planning to do so".
I'm worried about increasing the number of analyzes that much - on a subset of workloads it's really quite slow. Another version of this could be to integrate analyze.c's scan more closely with vacuum all the time. It's a bit bonkers that we often sequentially read blocks, evict them from shared buffers if we read them, just to then afterwards do random IO for blocks we've already read. That's imo what we eventually should do, but clearly it's not a small project. > This wouldn't have to happen every time, but it would happen fairly often. Do you have a mechanism for that in mind? Just something vacuum_count % 10 == 0 like? Or remember scanned_pages in pgstats and re-computing > > IME most of the time in analyze isn't spent doing IO for the sample blocks > > themselves, but CPU time and IO for toasted columns. A trimmed down version > > that just computes relallvisible should be a good bit faster. > > I worry about that from a code maintainability point of view. I'm > concerned that it won't be very cut down at all, in the end. I think it'd be fine to just use analyze.c and pass in an option to not compute column and inheritance stats. > Presumably you'll want to add the same I/O prefetching logic to this > cut-down version, just for example. Since without that there will be > no competition between it and ANALYZE proper. Besides which, isn't it > kinda wasteful to not just do a full ANALYZE? Sure, you can avoid > detoasting overhead that way. But even still. It's not just that analyze is expensive, I think it'll also be confusing if the column stats change after a manual VACUUM without ANALYZE. It shouldn't be too hard to figure out whether we're going to do an analyze anyway and not do the rowcount-estimate version when doing VACUUM ANALYZE or if autovacuum scheduled an analyze as well. Greetings, Andres Freund