Re: New IndexAM API controlling index vacuum strategies

Peter Geoghegan Mon, 08 Mar 2021 21:22:28 -0800

On Tue, Mar 2, 2021 at 8:49 PM Masahiko Sawada <[email protected]> wrote:
> On Tue, Mar 2, 2021 at 2:34 PM Peter Geoghegan <[email protected]> wrote:
> > lazy_vacuum_table_and_indexes() should probably not skip index
> > vacuuming when we're close to exceeding the space allocated for the
> > LVDeadTuples array. Maybe we should not skip when
> > vacrelstats->dead_tuples->num_tuples is greater than 50% of
> > dead_tuples->max_tuples? Of course, this would only need to be
> > considered when lazy_vacuum_table_and_indexes() is only called once
> > for the entire VACUUM operation (otherwise we have far too little
> > maintenance_work_mem/dead_tuples->max_tuples anyway).
>
> Doesn't it actually mean we consider how many dead *tuples* we
> collected during a vacuum? I’m not sure how important the fact we’re
> close to exceeding the maintenance_work_mem space. Suppose
> maintenance_work_mem is 64MB, we will not skip both index vacuum and
> heap vacuum if the number of dead tuples exceeds 5592404 (we can
> collect 11184809 tuples with 64MB memory). But those tuples could be
> concentrated in a small number of blocks, for example in a very large
> table case. It seems to contradict the current strategy that we want
> to skip vacuum if relatively few blocks are modified. No?


There are competing considerations. I think that we need to be
sensitive to accumulating "debt" here. The cost of index vacuuming
grows in a non-linear fashion as the index grows (or as
maintenance_work_mem is lowered). This is the kind of thing that we
should try to avoid, I think. I suspect that cases where we can skip
index vacuuming and heap vacuuming are likely to involve very few dead
tuples in most cases anyway.

We should not be sensitive to the absolute number of dead tuples when
it doesn't matter (say because they're concentrated in relatively few
heap pages). But when we overrun the maintenance_work_mem space, then
the situation changes; the number of dead tuples clearly matters just
because we run out of space for the TID array. The heap page level
skew is not really important once that happens.

That said, maybe there is a better algorithm. 50% was a pretty arbitrary number.

Have you thought more about how the index vacuuming skipping can be
configured by users? Maybe a new storage param, that works like the
current SKIP_VACUUM_PAGES_RATIO constant?

-- 
Peter Geoghegan

Re: New IndexAM API controlling index vacuum strategies

Reply via email to