Re: MaxOffsetNumber for Table AMs

Peter Geoghegan Wed, 05 May 2021 11:22:31 -0700

On Wed, May 5, 2021 at 10:57 AM Robert Haas <[email protected]> wrote:
> One advantage of indirect indexes is that you can potentially avoid a
> lot of writes to the index. If a non-HOT update is performed, but the
> primary key is not updated, the index does not need to be touched. I
> think that's a potentially significant savings, even if bottom-up
> index deletion would have prevented the page splits. Similarly, you
> can mark a dead line pointer unused without having to scan the
> indirect index, because the index isn't pointing to that dead line
> pointer anyway.


As I said, this is equivalent to solving the "TID is a stable
identifier of logical row" issue (an exceptionally hard problem that I
don't think is worth solving), except that you make the secondary
indexes have potentially larger keys for no benefit. Sure, you can
consistently refer to a logical row using its PK value (assuming you
have this whole two-phase locking infrastructure), but why wouldn't
you "just" solve the problem with TID directly instead? What does
involving PK values actually buy you? I am pretty sure that the answer
is "less than nothing".

It is still true that I'm arguing against ever having a clustered
index table AM, which would be somewhat useful to users (that much
I'll own). The main reason for that is because we'd still be required
to solve the "TID is a stable identifier of logical row" issue, except
it's not a physiological TID/rowid (it's a fully logical row
identifier). So everything seems to lead back to that hard problem
seeming insoluble.

> Hmm, but I guess you have another cleanup problem. What prevents
> someone from inserting a new row with the same primary key as a
> previously-deleted row but different values in some indirectly-indexed
> column?

Two-phase locking in indexes stops it. Note that this is pretty much
what happens in Oracle -- it's not just SQL Server.

This is why we have rich extensibility indexing -- indexes are
strictly physical data structures in Postgres.

> And, anyway, this whole argument is predicated on the fact that the
> only table AM we have right now is heapam. If we had a table AM that
> organized the data by primary key value, we'd still want to be able to
> have secondary indexes, and they'd have to use the primary key value
> as the TID.

But Jeff has a design for the columnstore table AM where TIDs are
essentially logical (not physiological like those of heapam), that
nevertheless will work with the design around TIDs that I have in
mind. "Logical identifiers" versus "Logical identifiers that stably
identify logical rows" seems like a subtle but important distinction
here.

Of course I cannot yet rule out the possibility that this approach to
TIDs will always be good enough. But it sure seems like it might be,
and starting with the assumption that it is and working backwards
seems like a good way to attack the problem as a practical matter.

> > I think that global indexes are well worth having, and should be
> > solved some completely different way. The partition key can be an
> > additive thing.
>
> I agree that the partition identifier should be an additive thing, but
> where would we add it? It seems to me that the obvious answer is to
> make it a column of the index tuple.

Right.

> And if we can do that, why can't
> we put whatever kind of TID-like stuff people want in the index tuple,
> too? Maybe part of the problem here is that I don't actually
> understand how posting lists are represented...

You want to use the partition identifier for predicate push-down and
stuff anyway, so making it part of the TID doesn't seem particularly
natural to me.

"Posting list splits" from the nbtree README will give you some idea
of why I care about making TIDs integer-like and equi-sized within any
given index tuple. There will be similar considerations for GIN.
Though I think that nbtree deduplication is important enough on its
own to try to preserve across table AMs.

-- 
Peter Geoghegan

Re: MaxOffsetNumber for Table AMs

Reply via email to