On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > Thanks! I'll change my parallel vacuum refactoring patch accordingly.
Thanks again for working on that. > Regarding the commit, I think that there still is one place in > lazyvacuum.c where we can change "dead tuples” to "dead items”: > > /* > * Allocate the space for dead tuples. Note that this handles parallel > * VACUUM initialization as part of allocating shared memory space used > * for dead_items. > */ > dead_items_alloc(vacrel, params->nworkers); > dead_items = vacrel->dead_items; Oops. Pushed a fixup for that just now. > Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES > and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose? That was deliberate. It would be a bit strange to alter these constants without also updating the corresponding column names for the pg_stat_progress_vacuum system view. But if I kept the definition from system_views.sql in sync, then I would break user scripts -- for reasons that users don't care about. That didn't seem like the right approach. Also, the system as a whole still assumes "DEAD tuples and LP_DEAD items are the same, and are just as much of a problem in the table as they are in each index". As you know, this is not really true, which is an important problem for us. Fixing it (perhaps as part of adding something like Robert's conveyor belt design) will likely require revising this model quite fundamentally (e.g, the vacthresh calculation in autovacuum.c:relation_needs_vacanalyze() would be replaced). When this happens, we'll probably need to update system views that have columns with names like "dead_tuples" -- because maybe we no longer specifically count dead items/tuples at all. I strongly suspect that the approach to statistics that we take for pg_statistic optimizer stats just doesn't work for dead items/tuples -- statistical sampling only produces useful statistics for the optimizer because certain delicate assumptions are met (even these assumptions only really work with a properly normalized database schema). Maybe revising the model used for autovacuum scheduling wouldn't include changing pg_stat_progress_vacuum, since that isn't technically "part of the model" --- I'm not sure. But it's not something that I am in a hurry to fix. -- Peter Geoghegan