Minor issue with deduplication's single value strategy

Peter Geoghegan Wed, 17 Jun 2020 15:49:16 -0700

nbtdedup.c's "single value strategy" is used when we have a leaf page
that is full of duplicates of the same value. We infer that we will
have many more in the future, and cooperate with the "single value
strategy" used within nbtsplitloc.c.  Deduplication should create 6
large posting list tuples that will remain on the page after an
anticipated page split, plus some remaining non-dedup'd regular tuples
that go on the new right page. So nbtdedup.c anticipates how an
eventual page split will need to work to keep space utilization high,
but not too high (though only in this specific "single value strategy"
case).


(Note that the choice of 6 posting list tuples is arbitrary -- it
seemed like limiting the size of posting list tuples to 1/3 of a page
(the long established maximum) was a bit too aggressive, so the
posting list tuple size limit was halved to 1/6 of the page.)

Sometimes, an append-only inserter of low cardinality data (just a few
distinct values) can leave some "single value" pages with perhaps 8 or
9 tuples -- not the intended 7 (i.e. not the intended 6 tuples plus 1
high key). This isn't really a problem. It can only happen when a
plain tuple is "wedged" between two existing max-sized posting list
tuples, where we cannot merge the plain tuple with either
adjoining/neighboring posting list tuple (since that violates the 1/6
of a page limit on posting list tuple size). This scenario is rare,
and in any case the space utilization is almost exactly what it's
supposed to be in the end (the BTREE_SINGLEVAL_FILLFACTOR "fill
factor" that was added to Postgres 12 is still respected in the
presence of deduplication in Postgres 13). This much is okay.

However, I noticed that it's also possible to confuse "single value
strategy" in nbtdedup.c into thinking that it has already observed 6
"max sized" tuples, when in fact it has not. This can lead to the
occasional page that contains perhaps 50 or 70 tuples. This is rare
enough that you have to really be paying attention to the structure of
the index to notice it -- most of my test case indexes that use
"single value strategy" a lot aren't even affected. I'm not okay with
this, because it's not how nbtdedup.c is intended to work -- clearly
nbtdedup.c gets confused as a consequence of the "plain tuple gets
wedged" issue. I'm not okay with this.

Attached patch nails down the behavior of single value strategy. This
slightly improves the space utilization in a small minority of my test
case indexes, though not by enough to matter. For example, the TPC-H
idx_customer_nationkey index goes from 1193 pages to 1182 pages. This
is an utterly insignificant issue, but there is no reason to allow it.
When I think about space utilization, I don't just look at the index
overall -- I also expect a certain amount of consistency within
related subsets of the index's key space.

-- 
Peter Geoghegan

v1-0001-Fix-nbtdedup.c-single-value-strategy-issue.patch
Description: Binary data

Minor issue with deduplication's single value strategy

Reply via email to