Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

Hannu Krosing Fri, 28 Nov 2025 13:01:34 -0800

On Fri, Nov 28, 2025 at 9:42 PM Mihail Nikalayeu
<[email protected]> wrote:
>
> Hi, Hannu!
>
> I think you pressed "Reply" instead of "Reply All" - so, I put it to
> the list (looks like nothing is secret here).
> Mostly it is because of my opinion at the end of the mail which I want
> to share with the list.


Thanks, and yes, it was meant for the list.

> On Fri, Nov 28, 2025 at 8:33 PM Hannu Krosing <[email protected]> wrote:
> > If it is an *index AM* then this may not solve HOT chains issue (see
> > below), if we put it on top of *table AM* as some kind of pass-through
> > collector then likely yes, though you may still want to do final sort
> > in commit order to know which one is the latest version of updated
> > tuples which needs to go in the index. The latter is not strictly
> > needed, but would be a nice optimisation for oft-updated rows.
>
> It is AM which is added as an index (with the same
> columns/expressions/predicates) to the table before phase 1 starts.
> So, all new tuples are inserted into it.
>
> > And I would not collect just TID, but also the indexes value, as else
> > we end up accessing the table in some random order for getting the
> > value (and possibly do visibility checks)
> Just TIDs - it is ordered at validation phase (while merging with an
> main index) and read using AIO - pretty fast.

It is a space vs work compromise - you either collect it at once or
have to read it again later. Even pretty fast is still slower than
doing nothing :)

> > I am not sure where we decide that tuple is HOT-updatable, but I
> > suspect that it is before we call any index AMs, so STIR ios not
> > guaranteed to solve the issues with HOT chains.
>
> I am not sure what the HOT-chains issue is, but it actually works
> correctly already, including stress tests.
> It is even merged into one commercial fork of PG (I am not affiliated
> with it in any way).

It was about a simplistic approach for VACUUM to just ignore the CIC
backends and then missing some inserts.

> > (And yes, I have a patch in works to include old and new tids>  as part
> > of logical decoding - they are "almost there", just not passed through
> > - which would help here too to easily keep just the last value)
>
> Yes, at least it is required for the REPACK case.
>
> But....
>
> Antonin already has a prototype of patch to enable logical decoding
> for all kinds of tables in [0] (done in scope of REPACK).
>
> So, if we have such mechanics in place, it looks nice (and almost the
> same) for both CIC and REPACK:
> * in both cases we create temporary slot to collect incoming tuples
> * in both cases scan the table resetting snapshot every few pages to
> keep xmin horizon propagate
> * in both cases the process already collected part every few megabytes
> * just the logic of using collected tuples is different...
>
> So, yes, from terms of effectiveness STIR seems to be better, but such
> a common approach like LD looks tempting to have for both REPACK/CIC.

My reasoning was mainly that using something that already exists, and
must work correctly in any case, is a better long-term strategy than
adding complexity in multiple places.

After looking up when CIC appeared (v 8.2) and when logical decoding
came along (v9.4) I start to think that CIC probably would have used
LD if it had been available when CIC was added.

> On Fri, Nov 28, 2025 at 5:58 PM Matthias van de Meent
> <[email protected]> wrote:
> > -1: Requiring the logical decoding system just to reindex an index
> >  without O(tablesize) lock time adds too much overhead,
>
> How big is the additional cost of maintaining logical decoding for a
> table? Could you please evolve a little bit?
>
> Best regards,
> Mikhail.
>
>
> [0]: https://www.postgresql.org/message-id/152010.1751307725%40localhost
> (v15-0007-Enable-logical-decoding-transiently-only-for-REPACK-.patch)

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

Reply via email to