On Fri, Apr 30, 2021 at 1:10 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > I agree that global indexes need more bits, but it doesn't necessarily > follow that we must have variable-width TIDs. We could for example > say that "real" TIDs are only 48 bits and index AMs that want to be > usable as global indexes must be capable of handling 64-bit TIDs, > leaving 16 bits for partition ID. A more forward-looking definition > would require global index AMs to store 96 bits (partition OID plus > 64-bit TID). Either way would be far simpler for every moving part > involved than going over to full varlena TIDs.
16 bits is not much for a partition identifier. We've already had complaints about INNER_VAR being too small, so apparently there are people who want to use really large numbers of partitions. But even if we imagine a hypothetical world where nobody uses more than a couple thousand partitions at once, it's very reasonable to want to avoid recycling partition identifiers so that detaching a partition can be O(1), and there's no way that's going to be viable if the whole address space is only 16 bits, because with time series data people are going to be continually creating new partitions and dropping old ones. I would guess that it probably is viable with 32 bits, but we'd have to have a mapping layer rather than using the OID directly to avoid wraparound collisions. Now this problem can be avoided by just requiring the AM to store more bits, exactly as you say. I suspect 96 bits is large enough for all of the practical use cases people have, or at least within spitting distance. But it strikes me as relatively inefficient to say that we're always going to store 96 bits for every TID. I certainly don't think we want to break on-disk compatibility and widen every existing btree index by changing all the 6-byte TIDs they're storing now to store 12 bytes TIDs that are at least half zero bytes, so I think we're bound to end up with at least two options: 6 and 12. But variable-width would be a lot nicer. You could store small TIDs and small partition identifiers very compactly, and only use the full number of bytes when the situation demands it. > > What problem do you think this proposal does solve? > > Accommodating table AMs that want more than 48 bits for a TID. > We're already starting to run up against the fact that that's not > enough bits for plausible use-cases. 64 bits may someday in the far > future not be enough either, but I think that's a very long way off. Do people actually want to store more than 2^48 rows in a table, or is this more about the division of a TID into a block number and an item number? -- Robert Haas EDB: http://www.enterprisedb.com