On 1/26/21 7:52 PM, John Naylor wrote:
On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra
<tomas.von...@enterprisedb.com <mailto:tomas.von...@enterprisedb.com>>
wrote:
>
>
> On 1/23/21 12:27 AM, John Naylor wrote:
> > Still, it would be great if multi-minmax can be a drop in
replacement. I
> > know there was a sticking point of a distance function not being
> > available on all types, but I wonder if that can be remedied or worked
> > around somehow.
> >
>
> Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
> replacement for minmax (essentially, using these opclasses as the
> default ones, with the option to switch back to plain minmax). I'm not
> convinced we should do that - though. Imagine you have minmax indexes in
> your existing DB, it's working perfectly fine, and then we come and just
> silently change that during dump/restore. Is there some past example
> when we did something similar and it turned it to be OK?
I was assuming pg_dump can be taught to insert explicit opclasses for
minmax indexes, so that upgrade would not cause surprises. If that's
true, only new indexes would have the different default opclass.
Maybe, I suppose we could do that. But I always found such changes
happening silently in the background a bit suspicious, because it may be
quite confusing. I certainly wouldn't expect such difference between
creating a new index and index created by dump/restore. Did we do such
changes in the past? That might be a precedent, but I don't recall any
example ...
> As for the distance functions, I'm pretty sure there are data types
> without "natural" distance - like most strings, for example. We could
> probably invent something, but the question is how much we can rely on
> it working well enough in practice.
>
> Of course, is minmax even the right index type for such data types?
> Strings are usually "labels" and not queried using range queries,
> although sometimes people encode stuff as strings (but then it's very
> unlikely we'll define the distance definition well). So maybe for those
> types a hash / bloom would be a better fit anyway.
Right.
> But I do have an idea - maybe we can do without distances, in those
> cases. Essentially, the primary issue of minmax indexes are outliers, so
> what if we simply sort the values, keep one range in the middle and as
> many single points on each tail?
That's an interesting idea. I think it would be a nice bonus to try to
do something along these lines. On the other hand, I'm not the one
volunteering to do the work, and the patch is useful as is.
IMO it's fairly small amount of code, so I'll take a stab at in in the
next version of the patch.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company