Re: WIP: Covering + unique indexes.

Alexander Korotkov Mon, 26 Mar 2018 03:12:07 -0700

On Sun, Mar 25, 2018 at 1:47 AM, Peter Geoghegan <p...@bowt.ie> wrote:


> I was going to say that you could just treat the low bit in the t_tid
> offset as representing "see catalog entry". My first idea was that
> nothing would have to change about the existing format, since internal
> page items already have only the low bit set within their offset.
> However, I now see that that won't really work, because we don't
> change the offset in high keys when they're copied from a real item
> during a page split. Whatever we do, it has to work equally well for
> all "separator keys" -- that is, it must work for both downlinks in
> internal pages, and all high keys (including high keys at the leaf
> level).
>

OK.


> A good solution is to use the unused 13th t_bit. If hash can have a
> INDEX_MOVED_BY_SPLIT_MASK, then nbtree can have a INDEX_ALT_TID_MASK.
> This avoids a BTREE_VERSION bump, and allows us to deal with the
> highkey offset issue. Actually, it's even more flexible than that --
> it can work with ordinary leaf tuples in the future, too. That is, we
> can eventually implement prefix truncation and deduplication at the
> leaf level using this representation, since there is nothing that
> limits INDEX_ALT_TID_MASK IndexTuples to "separator keys".
>
> The main difference between this approach to leaf prefix
> truncation/compression/deduplication, and the GIN entry tree's posting
> list representation would be that it wouldn't have to be
> super-optimized for duplicates, at the expense of more common case for
> regular nbtree indexes -- having few or no duplicates. A btree_gin
> index on pgbench_accounts(aid) looks very similar to an equivalent
> nbtree index if you just compare internal pages from each, but they
> look quite different at the leaf level, where GIN has 24 byte
> IndexTuples instead of 16 bytes IndexTuples. Of course, this is
> because the leaf pages have posting lists that can never be simple
> heap pointer TIDs.
>

Right, btree_gin is much smaller than regular btree when there are a lot
of duplicates.  When there is no duplicates then btree_gin becomes larger
than regular btree, because gin stores single item pointer less compact
than btree.

A secondary goal of this INDEX_ALT_TID_MASK representation should be
> that it won't even be necessary to know that an IndexTuple is
> contained within a leaf page rather than an index page (again, unlike
> GIN). I'm pretty confident that we can have a truly universal
> IndexTuple representation for nbtree, while supporting all of these
> standard optimizations.
>
> Sorry for going off in a tangent, but I think it's somewhat necessary
> to have a strategy here. Of course, we don't have to get everything
> right now, but we should be looking in this direction whenever we talk
> about on-disk nbtree changes.
>

So, as I get you're proposing to introduce INDEX_ALT_TID_MASK flag
which would indicate that we're storing something special in the t_tid
offset.  And that should help us not only for covering indexes, but also for
further btree enhancements including suffix truncation.  What exactly do
you propose to store into t_tid offset when INDEX_ALT_TID_MASK flag
is set?  Is it number of attributes in this particular index tuple?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: WIP: Covering + unique indexes.

Reply via email to