On Wed, Dec 1, 2021 at 4:42 AM Peter Geoghegan <p...@bowt.ie> wrote: > > On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > Thanks! I'll change my parallel vacuum refactoring patch accordingly. > > Thanks again for working on that. > > > Regarding the commit, I think that there still is one place in > > lazyvacuum.c where we can change "dead tuples” to "dead items”: > > > > /* > > * Allocate the space for dead tuples. Note that this handles parallel > > * VACUUM initialization as part of allocating shared memory space used > > * for dead_items. > > */ > > dead_items_alloc(vacrel, params->nworkers); > > dead_items = vacrel->dead_items; > > Oops. Pushed a fixup for that just now.
Thanks! > > > Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES > > and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose? > > That was deliberate. > > It would be a bit strange to alter these constants without also > updating the corresponding column names for the > pg_stat_progress_vacuum system view. But if I kept the definition from > system_views.sql in sync, then I would break user scripts -- for > reasons that users don't care about. That didn't seem like the right > approach. Agreed. > > Also, the system as a whole still assumes "DEAD tuples and LP_DEAD > items are the same, and are just as much of a problem in the table as > they are in each index". As you know, this is not really true, which > is an important problem for us. Fixing it (perhaps as part of adding > something like Robert's conveyor belt design) will likely require > revising this model quite fundamentally (e.g, the vacthresh > calculation in autovacuum.c:relation_needs_vacanalyze() would be > replaced). When this happens, we'll probably need to update system > views that have columns with names like "dead_tuples" -- because maybe > we no longer specifically count dead items/tuples at all. I strongly > suspect that the approach to statistics that we take for pg_statistic > optimizer stats just doesn't work for dead items/tuples -- statistical > sampling only produces useful statistics for the optimizer because > certain delicate assumptions are met (even these assumptions only > really work with a properly normalized database schema). > > Maybe revising the model used for autovacuum scheduling wouldn't > include changing pg_stat_progress_vacuum, since that isn't technically > "part of the model" --- I'm not sure. But it's not something that I am > in a hurry to fix. Understood. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/