On Mon, Apr 22, 2019 at 9:35 AM Andres Freund <and...@anarazel.de> wrote: > On 2019-04-21 17:46:09 -0700, Peter Geoghegan wrote: > > Andres has suggested that I work on teaching nbtree to accommodate > > variable-width, logical table identifiers, such as those required for > > indirect indexes, or clustered indexes, where secondary indexes must > > use a logical primary key value instead of a heap TID.
I'm revisiting this thread now because it may have relevance to the nbtree deduplication patch. If nothing else, the patch further commits us to the current heap TID format by making assumptions about the width of posting lists with 6 byte TIDs. Though I suppose a posting list almost has to have fixed width TIDs to perform acceptably. Database systems with a varwidth TID format probably don't support anything like posting lists. > I think it's two more cases: > > - table AMs that want to support tables that are bigger than 32TB. That > used to be unrealistic, but it's not anymore. Especially when the need > to VACUUM etc is largely removed / reduced. Can we steal some bits that are currently used for offset number instead? 16 bits is far more than we ever need to use for heap offset numbers in practice. (I wonder if this would also have benefits for the representation of in-memory bitmaps?) > - global indexes (for cross-partition unique constraints and such), > which need a partition identifier as part of the tid (or as part of > the index key, but I think that actually makes interaction with > indexam from other layers more complicated - the inside of the index > maybe may want to represent it as a column, but to the outside that > ought not to be visible) Can we just use an implementation level attribute for this? Would it be so bad if we weren't able to jump straight to the partition number without walking through the tuple when the tuple has varwidth attributes? (If that isn't acceptable, then we can probably make it work for global indexes without having to generalize everything.) In general, Postgres heap TIDs are not stable identifiers of rows (they only stably identify HOT chains). This is not the case in other database systems, which may go to great trouble to make it possible to assume that TIDs are stable over time (Andy Pavlo says that our TIDs are physical while Oracle's are logical -- I don't like that terminology, but know what he means). Generalizing the nbtree AM to be able to work with an arbitrary type of table row identifier that isn't at all like a TID raises tricky definitional questions. It would have to work in a way that made the new variety of table row identifier stable, which is a significant new requirement (and one that zheap is clearly not interested in). If we're willing to support something like a clustered index in nbtree, I wonder what we need to do to make things like numeric display scale still work. For bonus points, describe how unique checking works with a secondary unique index. I am not suggesting that these issues are totally insurmountable. What I am saying is this: If we already had "stable logical" TIDs instead of "mostly physical TIDs", then generalizing nbtree index tuples to store arbitrary table row identifiers would more or less be all about the data structure managed by nbtree. But that isn't the case, and that strongly discourages me from working on this -- we shouldn't talk about the problem as if it is mostly just a matter of settling of the best index tuple format. Frankly I am not very enthusiastic about working on a project that has unclear scope and unclear benefits for users. -- Peter Geoghegan