On Fri, Apr 30, 2021 at 10:39 AM Robert Haas <robertmh...@gmail.com> wrote:
> I agree up to a point but ... are you imagining that the TID continues
> to have its own special place in the page, while the partition
> identifier is stored more like a regular tuple column? Because it
> seems to me that it would be better to try to eliminate the
> special-storage case, just like we did for OIDs.

I agree in principle, but making that work well is very hard in
practice because of the format of IndexTuple -- which bleeds into
everything. That TID is special is probably a natural consequence of
the fact that we don't have an offset-based format of the kind you see
in other DB systems -- systems that don't emphasize extensibility. We
cannot jump to a hypothetical TID attribute inexpensively inside code
like _bt_compare() because we don't have a cheap way to jump straight
to the datum for any attribute. So we just store TID in IndexTuple
directly instead. Imagine how much more expensive VACUUM would be if
it had to grovel through the IndexTuple format.

I wonder how the same useful performance characteristics can be
maintained with a variable-width TID design. If you solve the problem
by changing IndexTuple, then you are kind of obligated to not use
varlena headers to keep the on-disk size manageable. Who knows where
it all ends?

-- 
Peter Geoghegan


Reply via email to