Re: Postgres picks suboptimal index after building of an extended statistics

Alexander Korotkov Thu, 21 Dec 2023 02:11:19 -0800

On Thu, Dec 21, 2023 at 10:41 AM Andrei Lepikhov
<[email protected]> wrote:
>
> On 18/12/2023 15:29, Alexander Korotkov wrote:
> > Also, there is a set of patches [7], [8], and [9], which makes the
> > optimizer consider path selectivity as long as path costs during the
> > path selection.  I've rechecked that none of these patches could resolve
> > the original problem described in [1].
> It is true. We accidentally mixed two different problems in one thread.
> >  Also, I think they are quite
> > tricky.  The model of our optimizer assumes that paths in the list
> > should be the different ways of getting the same result.  If we choose
> > the paths by their selectivity, that breaks this model.  I don't say
> > there is no way for this.  But if we do this, that would require
> > significant rethinking of our optimizer model and possible revision of a
> > significant part of it.
> I can't understand that. In [9] we just elaborate the COSTS_EQUAL case
> and establish final decision on more stable basis than a casual order of
> indexes in the list.


I took a closer look at the patch in [9].  I should drop my argument
about breaking the model, because add_path() already considers other
aspects than just costs.  But I have two more note about that patch:

1) It seems that you're determining the fact that the index path
should return strictly one row by checking path->rows <= 1.0 and
indexinfo->unique.  Is it really guaranteed that in this case quals
are matching unique constraint?  path->rows <= 1.0 could be just an
estimation error.  Or one row could be correctly estimated, but it's
going to be selected by some quals matching unique constraint and
other quals in recheck.  So, it seems there is a risk to select
suboptimal index due to this condition.

2) Even for non-unique indexes this patch is putting new logic on top
of the subsequent code.  How we can prove it's going to be a win?
That could lead, for instance, to dropping parallel-safe paths in
cases we didn't do so before.

Anyway, please start a separate thread if you're willing to put more
work into this.

> >  Anyway, I think if there is still interest in
> > this, that should be moved into a separate thread to keep this thread
> > focused on the problem described in [1].
> Agree. IMO, the problem of optimizer dependency on an order of indexes
> in the relation index list is more urgent for now.
> >
> > Finally, I'd like to note that the issue described in [1] is mostly the
> > selectivity estimation problem.  It could be solved by adding the
> > multi-column MCV statistics.  The patches published so far look more
> > like hacks for particular use cases rather than appropriate solutions.
> > It still looks promising to me to use the knowledge of unique
> > constraints during selectivity estimation [10].  Even though it's hard
> > to implement and possibly implies some overhead, it fits the current
> > model.  I also think unique contracts could probably be used in some way
> > to improve estimates even when there is no full match.
> I have tried to use the knowledge about unique indexes in the
> selectivity estimation routine. But it looks invasive and adds a lot of
> overhead.

I got it.  But it doesn't look enough to decide this is no way.  Could
you, please, share some of your results?  It might happen that we just
need to rework some of data structures to make this information more
easily accessible at selectivity estimation stage.

------
Regards,
Alexander Korotkov

Re: Postgres picks suboptimal index after building of an extended statistics

Reply via email to