Hi, On 2019-10-30 11:33:21 -0700, Peter Geoghegan wrote: > On Mon, Apr 22, 2019 at 9:35 AM Andres Freund <and...@anarazel.de> wrote: > > On 2019-04-21 17:46:09 -0700, Peter Geoghegan wrote: > > > Andres has suggested that I work on teaching nbtree to accommodate > > > variable-width, logical table identifiers, such as those required for > > > indirect indexes, or clustered indexes, where secondary indexes must > > > use a logical primary key value instead of a heap TID. > > I'm revisiting this thread now because it may have relevance to the > nbtree deduplication patch. If nothing else, the patch further commits > us to the current heap TID format by making assumptions about the > width of posting lists with 6 byte TIDs.
I'd much rather not entrench this further, even leaving global indexes aside. The 4 byte block number is a significant limitation for heap tables too, and we should lift that at some point not too far away. Then there's also other AMs that could really use a wider tid space. > Though I suppose a posting list almost has to have fixed width TIDs to > perform acceptably. Hm. It's not clear to me why that is? > > I think it's two more cases: > > > > - table AMs that want to support tables that are bigger than 32TB. That > > used to be unrealistic, but it's not anymore. Especially when the need > > to VACUUM etc is largely removed / reduced. > > Can we steal some bits that are currently used for offset number > instead? 16 bits is far more than we ever need to use for heap offset > numbers in practice. I think that's a terrible idea. For one, some AMs will have significant higher limits, especially taking compression and larger block sizes into account. Also not all AMs need identifiers tied so closely to a disk position, e.g. zedstore does not. We shouldn't hack evermore information into the offset, given that background. > (I wonder if this would also have benefits for the representation of > in-memory bitmaps?) Hm. Not sure how? > > - global indexes (for cross-partition unique constraints and such), > > which need a partition identifier as part of the tid (or as part of > > the index key, but I think that actually makes interaction with > > indexam from other layers more complicated - the inside of the index > > maybe may want to represent it as a column, but to the outside that > > ought not to be visible) > > Can we just use an implementation level attribute for this? Would it > be so bad if we weren't able to jump straight to the partition number > without walking through the tuple when the tuple has varwidth > attributes? (If that isn't acceptable, then we can probably make it > work for global indexes without having to generalize everything.) Having to walk through the index tuple might be acceptable - in all likelihood we'll have to do so anyway. It does however not *really* resolve the issue that we still need to pass something tid back from the indexam, so we can fetch the associated tuple from the heap, or add the tid to a bitmap. But that could be done separately from the index internal data structures. > Generalizing the nbtree AM to be able to work with an arbitrary type > of table row identifier that isn't at all like a TID raises tricky > definitional questions. It would have to work in a way that made the > new variety of table row identifier stable, which is a significant new > requirement (and one that zheap is clearly not interested in). Hm. I don't see why a different types of TID would imply them being stable? > I am not suggesting that these issues are totally insurmountable. What > I am saying is this: If we already had "stable logical" TIDs instead > of "mostly physical TIDs", then generalizing nbtree index tuples to > store arbitrary table row identifiers would more or less be all about > the data structure managed by nbtree. But that isn't the case, and > that strongly discourages me from working on this -- we shouldn't talk > about the problem as if it is mostly just a matter of settling of the > best index tuple format. > Frankly I am not very enthusiastic about working on a project that has > unclear scope and unclear benefits for users. Why would properly supporting AMs like zedstore, global indexes, "indirect" indexes etc benefit users? Greetings, Andres Freund