Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation

Peter Geoghegan Wed, 30 Oct 2019 11:34:35 -0700

On Mon, Apr 22, 2019 at 9:35 AM Andres Freund <[email protected]> wrote:
> On 2019-04-21 17:46:09 -0700, Peter Geoghegan wrote:
> > Andres has suggested that I work on teaching nbtree to accommodate
> > variable-width, logical table identifiers, such as those required for
> > indirect indexes, or clustered indexes, where secondary indexes must
> > use a logical primary key value instead of a heap TID.


I'm revisiting this thread now because it may have relevance to the
nbtree deduplication patch. If nothing else, the patch further commits
us to the current heap TID format by making assumptions about the
width of posting lists with 6 byte TIDs. Though I suppose a posting
list almost has to have fixed width TIDs to perform acceptably.
Database systems with a varwidth TID format probably don't support
anything like posting lists.

> I think it's two more cases:
>
> - table AMs that want to support tables that are bigger than 32TB. That
>   used to be unrealistic, but it's not anymore. Especially when the need
>   to VACUUM etc is largely removed / reduced.

Can we steal some bits that are currently used for offset number
instead? 16 bits is far more than we ever need to use for heap offset
numbers in practice. (I wonder if this would also have benefits for
the representation of in-memory bitmaps?)

> - global indexes (for cross-partition unique constraints and such),
>   which need a partition identifier as part of the tid (or as part of
>   the index key, but I think that actually makes interaction with
>   indexam from other layers more complicated - the inside of the index
>   maybe may want to represent it as a column, but to the outside that
>   ought not to be visible)

Can we just use an implementation level attribute for this? Would it
be so bad if we weren't able to jump straight to the partition number
without walking through the tuple when the tuple has varwidth
attributes? (If that isn't acceptable, then we can probably make it
work for global indexes without having to generalize everything.)

In general, Postgres heap TIDs are not stable identifiers of rows
(they only stably identify HOT chains). This is not the case in other
database systems, which may go to great trouble to make it possible to
assume that TIDs are stable over time (Andy Pavlo says that our TIDs
are physical while Oracle's are logical -- I don't like that
terminology, but know what he means). Generalizing the nbtree AM to be
able to work with an arbitrary type of table row identifier that isn't
at all like a TID raises tricky definitional questions. It would have
to work in a way that made the new variety of table row identifier
stable, which is a significant new requirement (and one that zheap is
clearly not interested in).

If we're willing to support something like a clustered index in
nbtree, I wonder what we need to do to make things like numeric
display scale still work. For bonus points, describe how unique
checking works with a secondary unique index.

I am not suggesting that these issues are totally insurmountable. What
I am saying is this: If we already had "stable logical" TIDs instead
of "mostly physical TIDs", then generalizing nbtree index tuples to
store arbitrary table row identifiers would more or less be all about
the data structure managed by nbtree. But that isn't the case, and
that strongly discourages me from working on this -- we shouldn't talk
about the problem as if it is mostly just a matter of settling of the
best index tuple format. Frankly I am not very enthusiastic about
working on a project that has unclear scope and unclear benefits for
users.

-- 
Peter Geoghegan

Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation

Reply via email to