Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation

Peter Geoghegan Mon, 22 Apr 2019 10:17:17 -0700

On Mon, Apr 22, 2019 at 8:36 AM Stephen Frost <sfr...@snowman.net> wrote:
> This seems like it would be helpful for global indexes as well, wouldn't
> it?


Yes, though that should probably work by reusing what we already do
with heap TID (use standard IndexTuple fields on the leaf level for
heap TID), plus an additional identifier for the partition number that
is located at the physical end of the tuple. IOW, I think that this
might benefit from a design that is half way between what we already
do with heap TIDs and what we would be required to do to make varwidth
logical row identifiers in tables work -- the partition number is
varwidth, though often only a single byte.

> I agree with trying to avoid having padding 'in the wrong place' and if
> it makes some indexes smaller, great, even if they're unlikely to be
> interesting in the vast majority of cases, they may still exist out
> there.  Of course, this is provided that it doesn't overly complicate
> the code, but it sounds like it wouldn't be too bad in this case.

Here is what it took:

* Removed the "conservative" MAXALIGN() within index_form_tuple(),
bringing it in line with heap_form_tuple(), which only MAXALIGN()s so
that the first attribute in tuple's data area can safely be accessed
on alignment-picky platforms, but doesn't do the same with data_len.

* Removed most of the MAXALIGN()s from nbtinsert.c, except one that
considers if a page split is required.

* Didn't change the nbtsplitloc.c code, because we need to assume
MAXALIGN()'d space quantities there. We continue to not trust the
reported tuple length to be MAXALIGN()'d, which is now essentially
rather than just defensive.

* Removed MAXALIGN()s within _bt_truncate(), and SHORTALIGN()'d the
whole tuple size in the case where new pivot tuple requires a heap TID
representation. We access TIDs as 3 2 byte integers, so this is
necessary for alignment-picky platforms.

I will pursue this as a project for PostgreSQL 13. It doesn't affect
on-disk compatibility, because BTreeTupleGetHeapTID() works just as
well with either the existing scheme, or this new one. Having the
"real" tuple length available will make it easier to implement "true"
suffix truncation, where we truncate *within* a text attribute (i.e.
generate a new, shorter value using new opclass infrastructure).

-- 
Peter Geoghegan

Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation

Reply via email to