Re: Feature request for adoptive indexes

Robert Haas Mon, 01 Nov 2021 13:06:43 -0700

On Tue, Oct 26, 2021 at 11:11 AM Tomas Vondra
<[email protected]> wrote:
> If I get what you propose, you want to have a "top" tree for (job, nlp,
> year), which "splits" the data set into subsets of ~5000-7000 rows. And
> then for each subset you want a separate "small" trees on each of the
> other columns, so in this case three trees.
>
> Well, the problem with this is pretty obvious - each of the small trees
> requires separate copies of the leaf pages. And remember, in a btree the
> internal pages are usually less than 1% of the index, so this pretty
> much triples the size of the index. And if you insert a row into the
> index, it has to insert the item pointer into each of the small trees,
> likely requiring a separate I/O for each.
>
> So I'd bet this is not any different from just having three separate
> indexes - it doesn't save space, doesn't save I/O, nothing.


I agree. In a lot of cases, it's actually useful to define the index
on fewer columns, like (job, nlp, year) or even just (job, nlp) or
even just (job) because it makes the index so much smaller and that's
pretty important. If you have enough duplicate entries in a (job, nlp,
year) index to justify create a (job, nlp, year, sequence) index, the
number of distinct (job, nlp, year) tuples has to be small compared to
the number of (job, nlp, year, sequence) tuples - and that means that
you wouldn't actually save much by combining your (job, nlp, year,
sequence) index with a (job, nlp, year, other-stuff) index. As you
say, the internal pages aren't the problem.

I don't intend to say that there's no possible use for this kind of
technology. Peter G. says that people are writing research papers
about that and they probably wouldn't be doing that unless they'd
found some case where it's a big win. But this example seems extremely
unconvincing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Feature request for adoptive indexes

Reply via email to