On Thu, Dec 21, 2023 at 10:41 AM Andrei Lepikhov <a.lepik...@postgrespro.ru> wrote: > > On 18/12/2023 15:29, Alexander Korotkov wrote: > > Also, there is a set of patches [7], [8], and [9], which makes the > > optimizer consider path selectivity as long as path costs during the > > path selection. I've rechecked that none of these patches could resolve > > the original problem described in [1]. > It is true. We accidentally mixed two different problems in one thread. > > Also, I think they are quite > > tricky. The model of our optimizer assumes that paths in the list > > should be the different ways of getting the same result. If we choose > > the paths by their selectivity, that breaks this model. I don't say > > there is no way for this. But if we do this, that would require > > significant rethinking of our optimizer model and possible revision of a > > significant part of it. > I can't understand that. In [9] we just elaborate the COSTS_EQUAL case > and establish final decision on more stable basis than a casual order of > indexes in the list.
I took a closer look at the patch in [9]. I should drop my argument about breaking the model, because add_path() already considers other aspects than just costs. But I have two more note about that patch: 1) It seems that you're determining the fact that the index path should return strictly one row by checking path->rows <= 1.0 and indexinfo->unique. Is it really guaranteed that in this case quals are matching unique constraint? path->rows <= 1.0 could be just an estimation error. Or one row could be correctly estimated, but it's going to be selected by some quals matching unique constraint and other quals in recheck. So, it seems there is a risk to select suboptimal index due to this condition. 2) Even for non-unique indexes this patch is putting new logic on top of the subsequent code. How we can prove it's going to be a win? That could lead, for instance, to dropping parallel-safe paths in cases we didn't do so before. Anyway, please start a separate thread if you're willing to put more work into this. > > Anyway, I think if there is still interest in > > this, that should be moved into a separate thread to keep this thread > > focused on the problem described in [1]. > Agree. IMO, the problem of optimizer dependency on an order of indexes > in the relation index list is more urgent for now. > > > > Finally, I'd like to note that the issue described in [1] is mostly the > > selectivity estimation problem. It could be solved by adding the > > multi-column MCV statistics. The patches published so far look more > > like hacks for particular use cases rather than appropriate solutions. > > It still looks promising to me to use the knowledge of unique > > constraints during selectivity estimation [10]. Even though it's hard > > to implement and possibly implies some overhead, it fits the current > > model. I also think unique contracts could probably be used in some way > > to improve estimates even when there is no full match. > I have tried to use the knowledge about unique indexes in the > selectivity estimation routine. But it looks invasive and adds a lot of > overhead. I got it. But it doesn't look enough to decide this is no way. Could you, please, share some of your results? It might happen that we just need to rework some of data structures to make this information more easily accessible at selectivity estimation stage. ------ Regards, Alexander Korotkov