Re: PG 13 release notes, first draft

Peter Geoghegan Thu, 07 May 2020 11:55:25 -0700

Hi Bruce,

On Mon, May 4, 2020 at 8:16 PM Bruce Momjian <br...@momjian.us> wrote:
> I have committed the first draft of the PG 13 release notes.  You can
> see them here:
>
>         https://momjian.us/pgsql_docs/release-13.html


I see that you have an entry for the deduplication feature:

"More efficiently store duplicates in btree indexes (Anastasia
Lubennikova, Peter Geoghegan)"

I would like to provide some input on this. Fortunately it's much
easier to explain than the B-Tree work that went into Postgres 12. I
think that you should point out that deduplication works by storing
the duplicates in the obvious way: Only storing the key once per
distinct value (or once per distinct combination of values in the case
of multi-column indexes), followed by an array of TIDs (i.e. a posting
list). Each TID points to a separate row in the table.

It won't be uncommon for this to make indexes as much as 3x smaller
(it depends on a number of different factors that you can probably
guess). I wrote a summary of how it works for power users in the
B-Tree documentation chapter, which you might want to link to in the
release notes:

https://www.postgresql.org/docs/devel/btree-implementation.html#BTREE-DEDUPLICATION

Users that pg_upgrade will have to REINDEX to actually use the
feature, regardless of which version they've upgraded from. There are
also some limited caveats about the data types that can use
deduplication, and stuff like that -- see the documentation section I
linked to.

Finally, you might want to note that the feature is enabled by
default, and can be disabled by setting the "deduplicate_items" index
storage option to "off". (We have yet to make a final decision on
whether the feature should be enabled before the first stable release
of Postgres 13, though -- I have an open item for that.)

-- 
Peter Geoghegan

Re: PG 13 release notes, first draft

Reply via email to