On Fri, Apr 30, 2021 at 10:39 AM Robert Haas <robertmh...@gmail.com> wrote: > I agree up to a point but ... are you imagining that the TID continues > to have its own special place in the page, while the partition > identifier is stored more like a regular tuple column? Because it > seems to me that it would be better to try to eliminate the > special-storage case, just like we did for OIDs.
I agree in principle, but making that work well is very hard in practice because of the format of IndexTuple -- which bleeds into everything. That TID is special is probably a natural consequence of the fact that we don't have an offset-based format of the kind you see in other DB systems -- systems that don't emphasize extensibility. We cannot jump to a hypothetical TID attribute inexpensively inside code like _bt_compare() because we don't have a cheap way to jump straight to the datum for any attribute. So we just store TID in IndexTuple directly instead. Imagine how much more expensive VACUUM would be if it had to grovel through the IndexTuple format. I wonder how the same useful performance characteristics can be maintained with a variable-width TID design. If you solve the problem by changing IndexTuple, then you are kind of obligated to not use varlena headers to keep the on-disk size manageable. Who knows where it all ends? -- Peter Geoghegan