> From: Peter Geoghegan <p...@bowt.ie>
> Sent: July 9, 2019 3:01 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density 
> as new implicit FILLFACTOR
>
> On Tue, Jul 9, 2019 at 11:27 AM John Lumby <johnlu...@hotmail.com> wrote:
> > And the point of the REINDEX at that point (below) is to remove dead tuple 
> > keys-tids
> > and  reorganize those split pages back into physical order without losing 
> > the freespace.
>
> VACUUM already removes the tuples, accounting for all overhead.
>
> We could in principle come up with a way of moving pages around,
>  [ ... ]
> That would either necessitate that the command acquire a disruptive lock
>  [ ... ]
> Neither of which seem particularly appealing.

I was not thinking of a new command,  just an extension of the existing REINDEX
which would apply a fillfactor equal to current average page density,
by adding a preliminary step to sample that first.
Of course,   the user can do that for themselves by a series of steps with
ANALYZE, get page_density from pgstattuple,  ALTER INDEX,  REINDEX
so this new parameter would be a convenience,  assuming that this sequence
actually is beneficial,   which I believe it is  -  see my next.

>
> I believe that this is a lot more important in systems that generally
> use clustered indexes, such as MS SQL Server. This kind of
> "fragmentation" isn't usually much of a problem when using Postgres.
>
We have found that, for an index which has both experienced large number of 
page splits
and whose table has a large number of dead tuples (despite autovacuum),
REINDEX with FILLFACTOR set to current page_density does produce a performance 
improvement,
and also does reduce future growth in number of pages.    I don't have numbers 
to
hand,  and in fact not sure if any catalog view or pgstattuple tells me about 
the proportion
of dead key-tids in the index itself (do you know of any source?) as opposed to 
the table,
but based on that recollection,  yes,   REINDEX can reduce fragmentation.

However we did not run a VACUUM command first.     Maybe if we had run VACUUM 
instead of
the REINDEX commands,   we might have obtained the same degree of improvement,  
I don't know.
I think this was Tom's point earlier on in this thread.

Correct me if I'm wrong but I believe whether an index is "clustered" or not is 
not relevant for
this discussion because the clustering in that context is referring to ordering 
of the
table pages,  not the index pages.    I believe it is quite possible to have a 
perfectly
"clustered" table whose clustering index is itself badly disorganized.

Cheers,     John

Reply via email to