Re: WIP: BRIN multi-range indexes

Tomas Vondra Tue, 26 Jan 2021 14:59:27 -0800



On 1/26/21 7:52 PM, John Naylor wrote:

On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra<tomas.von...@enterprisedb.com <mailto:tomas.von...@enterprisedb.com>>wrote:
 >
 >
 > On 1/23/21 12:27 AM, John Naylor wrote:
> > Still, it would be great if multi-minmax can be a drop inreplacement. I
 > > know there was a sticking point of a distance function not being
 > > available on all types, but I wonder if that can be remedied or worked
 > > around somehow.
 > >
 >
 > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
 > replacement for minmax (essentially, using these opclasses as the
 > default ones, with the option to switch back to plain minmax). I'm not
 > convinced we should do that - though. Imagine you have minmax indexes in
 > your existing DB, it's working perfectly fine, and then we come and just
 > silently change that during dump/restore. Is there some past example
 > when we did something similar and it turned it to be OK?
I was assuming pg_dump can be taught to insert explicit opclasses forminmax indexes, so that upgrade would not cause surprises. If that'strue, only new indexes would have the different default opclass.

Maybe, I suppose we could do that. But I always found such changeshappening silently in the background a bit suspicious, because it may bequite confusing. I certainly wouldn't expect such difference betweencreating a new index and index created by dump/restore. Did we do suchchanges in the past? That might be a precedent, but I don't recall anyexample ...

 > As for the distance functions, I'm pretty sure there are data types
 > without "natural" distance - like most strings, for example. We could
 > probably invent something, but the question is how much we can rely on
 > it working well enough in practice.
 >
 > Of course, is minmax even the right index type for such data types?
 > Strings are usually "labels" and not queried using range queries,
 > although sometimes people encode stuff as strings (but then it's very
 > unlikely we'll define the distance definition well). So maybe for those
 > types a hash / bloom would be a better fit anyway.

Right.

 > But I do have an idea - maybe we can do without distances, in those
 > cases. Essentially, the primary issue of minmax indexes are outliers, so
 > what if we simply sort the values, keep one range in the middle and as
 > many single points on each tail?

That's an interesting idea. I think it would be a nice bonus to try todo something along these lines. On the other hand, I'm not the onevolunteering to do the work, and the patch is useful as is.

IMO it's fairly small amount of code, so I'll take a stab at in in thenext version of the patch.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: BRIN multi-range indexes

Reply via email to