On Wed, Apr 24, 2019 at 5:22 AM Robert Haas <robertmh...@gmail.com> wrote: > If you drop or detach a partition, you can either (a) perform, as part > of that operation, a scan of every global index to remove all > references to the former partition, or (b) tell each global indexes > that all references to that partition number ought to be regarded as > dead index tuples. (b) makes detaching partitions faster and (a) > seems hard to make rollback-safe, so I'm guessing we'll end up with > (b).
I agree that (b) is the way to go. > We don't want people to be able to exhaust the supply of partition > numbers the way they can exhaust the supply of attribute numbers by > adding and dropping columns repeatedly. I agree that a partition numbering system needs to be able to accommodate arbitrarily-many partitions over time. It wouldn't have occurred to me to do it any other way. It is far far easier to make this work than it would be to retrofit varwidth attribute numbers. We won't have to worry about the HeapTupleHeaderGetNatts() representation. At the same time, nothing stops us from representing partition numbers in a simpler though less space efficient way in system catalogs. The main point of having global indexes is to be able to push down the partition number and use it during index scans. We can store the partition number at the end of the tuple on leaf pages, so that it's easily accessible (important for VACUUM), while continuing to use the IndexTuple fields for heap TID. On internal pages, the IndexTuple fields must be used for the downlink (block number of child), so both partition number and heap TID have to go towards the end of the tuples (this happens just with heap TID on Postgres 12). Of course, suffix truncation will manage to consistently get rid of both in most cases, especially when the global index is a unique index. The hard part is how to do varwidth encoding for space-efficient partition numbers while continuing to use IndexTuple fields for heap TID on the leaf level, *and* also having a BTreeTupleGetHeapTID()-style macro to get partition number without walking the entire index tuple. I suppose you could make the byte at the end of the tuple indicate that there are in fact 31 bits total when its high bit is set -- otherwise it's a 7 bit integer. Something like that may be the way to go. The alignment rules seem to make it worthwhile to keep the heap TID in the tuple header; it seems inherently necessary to have a MAXALIGN()'d tuple header, so finding a way to consistently put the first MAXALIGN() quantum to good use seems wise. -- Peter Geoghegan