Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

Peter Geoghegan Wed, 13 Nov 2019 11:52:33 -0800

On Wed, Nov 13, 2019 at 11:33 AM Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Nov 12, 2019 at 6:22 PM Peter Geoghegan <p...@bowt.ie> wrote:
> > * Disabled deduplication in system catalog indexes by deeming it
> > generally unsafe.
>
> I (continue to) think that deduplication is a terrible name, because
> you're not getting rid of the duplicates. You are using a compressed
> representation of the duplicates.


"Deduplication" never means that you get rid of duplicates. According
to Wikipedia's deduplication article: "Whereas compression algorithms
identify redundant data inside individual files and encodes this
redundant data more efficiently, the intent of deduplication is to
inspect large volumes of data and identify large sections – such as
entire files or large sections of files – that are identical, and
replace them with a shared copy".

This seemed like it fit what this patch does. We're concerned with a
specific, simple kind of redundancy. Also:

* From the user's point of view, we're merging together what they'd
call duplicates. They don't really think of the heap TID as part of
the key.

* The term "compression" suggests a decompression penalty when
reading, which is not the case here.

* The term "compression" confuses the feature added by the patch with
TOAST compression. Now we may have two very different varieties of
compression in the same index.

Can you suggest an alternative?

-- 
Peter Geoghegan

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

Reply via email to