Re: disfavoring unparameterized nested loops

2023-09-20 Thread Lepikhov Andrei
On Wed, Sep 20, 2023, at 4:49 PM, David Rowley wrote: > On Wed, 20 Sept 2023 at 19:56, Andrey Lepikhov > wrote: >> What could you say about a different way: hybrid join? In MS SQL Server, >> they have such a feature [1], and, according to their description, it >> requires low overhead. They sta

Re: disfavoring unparameterized nested loops

2023-09-20 Thread David Rowley
On Wed, 20 Sept 2023 at 19:56, Andrey Lepikhov wrote: > What could you say about a different way: hybrid join? In MS SQL Server, > they have such a feature [1], and, according to their description, it > requires low overhead. They start from HashJoin and switch to NestLoop > if the inner input con

Re: disfavoring unparameterized nested loops

2023-09-20 Thread Andrey Lepikhov
On 29/9/2022 21:32, Benjamin Coutu wrote: I'd like to revamp this important discussion. As is well described in this fairly recent paper here https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks at Postgres) "estimation errors quickly grow as the number of joins increases, and that

Re: disfavoring unparameterized nested loops

2022-10-02 Thread Peter Geoghegan
On Fri, Sep 30, 2022 at 3:19 PM Benjamin Coutu wrote: > > For all I know you might be onto something. But it really seems > > independent to me. > > Yeah, I‘m sorry if I highjacked this thread for something related but > technically different. I certainly wouldn't say that you hijacked the threa

Re: disfavoring unparameterized nested loops

2022-10-02 Thread Peter Geoghegan
On Sun, Oct 2, 2022 at 3:43 AM Robert Haas wrote: > On Fri, Sep 30, 2022 at 2:24 PM Peter Geoghegan wrote: > > It's not just that the risks are ludicrously high, of course. The > > potential benefits must *also* be very low. It's both factors, > > together. > > Hmm, maybe. But it also wouldn't su

Re: disfavoring unparameterized nested loops

2022-10-02 Thread Robert Haas
On Fri, Sep 30, 2022 at 2:24 PM Peter Geoghegan wrote: > It's not just that the risks are ludicrously high, of course. The > potential benefits must *also* be very low. It's both factors, > together. Hmm, maybe. But it also wouldn't surprise me very much if someone can come up with a test case wh

Re: disfavoring unparameterized nested loops

2022-09-30 Thread Benjamin Coutu
> Sure, but the model isn't the problem here, really -- not to me. The > problem is that the planner can in some cases choose a plan that is > inherently unreasonable, at least in practical terms. You're talking > about uncertainties. But I'm actually talking about the opposite thing > -- certainty

Re: disfavoring unparameterized nested loops

2022-09-30 Thread Peter Geoghegan
On Thu, Sep 29, 2022 at 11:44 PM Benjamin Coutu wrote: > I think these things are orthogonal. I agree that they're orthogonal. I just meant that execution time strategies seem underexplored in general. > No matter how good the cost model ever gets, we will always have degenerate > cases. Sure,

Re: disfavoring unparameterized nested loops

2022-09-30 Thread Peter Geoghegan
On Thu, Sep 29, 2022 at 9:00 PM David Rowley wrote: > I understand that what you propose would be a fast way to fix this > issue. However, if we went and changed the join path creation code to > not add non-parameterised nested loop paths when other paths exist, > then how could we ever dare to pu

Re: disfavoring unparameterized nested loops

2022-09-30 Thread Peter Geoghegan
On Fri, Sep 30, 2022 at 10:43 AM Robert Haas wrote: > But it's unnecessary to do this computation at runtime for every separate > unparameterized nested loop: we can do it right here, in a generic > way, for every such loop. It's not just that the risks are ludicrously high, of course. The potent

Re: disfavoring unparameterized nested loops

2022-09-30 Thread Robert Haas
On Thu, Sep 29, 2022 at 7:46 PM Tom Lane wrote: > Agreed, but dealing with uncertainty in those numbers is an enormous > task if you want to do it right. "Doing it right", IMV, would start > out by extending all the selectivity estimation functions to include > error bars; then we could have erro

Re: disfavoring unparameterized nested loops

2022-09-30 Thread Benjamin Coutu
> Actually, if we wanted to improve things in this area, we should have a > set of queries that don't chose optimal plans we can test with. We used > to see them a lot before we had extended statistics, but I don't > remember seeing many recently, let alone a collection of them. I guess > that is

Re: disfavoring unparameterized nested loops

2022-09-30 Thread Benjamin Coutu
> Agreed, but dealing with uncertainty in those numbers is an enormous > task if you want to do it right. "Doing it right", IMV, would start > out by extending all the selectivity estimation functions to include > error bars; then we could have error bars on rowcount estimates and > then costs; th

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Benjamin Coutu
> In general I suspect that we'd be better off focussing on mitigating > the impact at execution time. There are at least a few things that we > could do there, at least in theory. Mostly very ambitious, long term > things. I think these things are orthogonal. No matter how good the cost model e

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Benjamin Coutu
> Right. But that seems fraught with difficulty. I suspect that the > costs that the planner attributes to each plan often aren't very > reliable in any absolute sense, even when everything is working very > well by every available metric. Even a very noisy cost model with > somewhat inaccurate sel

Re: disfavoring unparameterized nested loops

2022-09-29 Thread David Rowley
On Fri, 30 Sept 2022 at 13:06, Peter Geoghegan wrote: > I like the idea of just avoiding unparameterized nested loop joins > altogether when an "equivalent" hash join plan is available because > it's akin to an execution-time mitigation, despite the fact that it > happens during planning. While it

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Bruce Momjian
On Thu, Sep 29, 2022 at 07:51:47PM -0400, Bruce Momjian wrote: > On Thu, Sep 29, 2022 at 07:46:18PM -0400, Tom Lane wrote: > > Agreed, but dealing with uncertainty in those numbers is an enormous > > task if you want to do it right. "Doing it right", IMV, would start > > out by extending all the s

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2022 at 4:46 PM Tom Lane wrote: > Agreed, but dealing with uncertainty in those numbers is an enormous > task if you want to do it right. "Doing it right", IMV, would start > out by extending all the selectivity estimation functions to include > error bars; then we could have erro

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Bruce Momjian
On Thu, Sep 29, 2022 at 07:46:18PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > I think the point the original poster as making, and I have made in the > > past, is that even of two optimizer costs are the same, one might be > > more penalized by misestimation than the other, and we don't ha

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Tom Lane
Bruce Momjian writes: > I think the point the original poster as making, and I have made in the > past, is that even of two optimizer costs are the same, one might be > more penalized by misestimation than the other, and we don't have a good > way of figuring that into our plan choices. Agreed, b

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2022 at 4:27 PM Bruce Momjian wrote: > I think the point the original poster as making, and I have made in the > past, is that even of two optimizer costs are the same, one might be > more penalized by misestimation than the other, and we don't have a good > way of figuring that in

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Bruce Momjian
On Thu, Sep 29, 2022 at 04:12:06PM -0700, Peter Geoghegan wrote: > Offhand I'd say it's more likely to be too complicated. Without > meaning to sound glib, the first question that occurs to me is "will > we also need a conviction multiplier conviction multiplier?". Anything > like that is going to

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2022 at 7:32 AM Benjamin Coutu wrote: > Also, we can expand the multiplier whenever we fall back to using the default > cardinality constant as surely all bets are off at that point - we should > definitely treat nested loop joins as out of favor in this instance and that > coul

Re: disfavoring unparameterized nested loops

2022-09-29 Thread Benjamin Coutu
I'd like to revamp this important discussion. As is well described in this fairly recent paper here https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks at Postgres) "estimation errors quickly grow as the number of joins increases, and that these errors are usually the reason for bad

Re: disfavoring unparameterized nested loops

2022-01-13 Thread Robert Haas
On Wed, Jan 12, 2022 at 7:20 PM Peter Geoghegan wrote: > Should I take it that you've dropped this project? I was rather hoping > that you'd get back to it, FWIW. Honestly, I'd forgotten all about it. While in theory I'd like to do something about this, but I just have too many other things goin

Re: disfavoring unparameterized nested loops

2022-01-12 Thread Peter Geoghegan
On Tue, Jun 22, 2021 at 4:13 AM Robert Haas wrote: > I think that's a reasonable request. I'm not sure that I believe it's > 100% necessary, but it's certainly an improvement on a technical > level, and given that the proposed change could impact quite a lot of > plans, it's fair to want to see so

Re: disfavoring unparameterized nested loops

2021-08-02 Thread Mike Klaas
I think that it is worth paying more than nothing to avoid plans that are so far from optimal that they might as well take forever to execute I recently came across this article from 2016 that expounded upon a bad plan of the sort discussed in this thread: https://heap.io/blog/when-to-avoid-jsonb

Re: disfavoring unparameterized nested loops

2021-06-22 Thread Peter Geoghegan
On Tue, Jun 22, 2021 at 2:53 AM Tomas Vondra wrote: > Yeah, I like the insurance analogy - it gets to the crux of the problem, > because insurance is pretty much exactly about managing risk. The user's exposure to harm is what truly matters. I admit that that's very hard to quantify, but we shoul

Re: disfavoring unparameterized nested loops

2021-06-22 Thread Robert Haas
On Mon, Jun 21, 2021 at 4:52 PM Tom Lane wrote: > I'm willing to take some flak if there's not an easy proof that the outer > scan is single-row, but I don't think we should just up and change it > for cases where there is. I think that's a reasonable request. I'm not sure that I believe it's 100

Re: disfavoring unparameterized nested loops

2021-06-22 Thread Tomas Vondra
On 6/22/21 2:25 AM, Peter Geoghegan wrote: > On Mon, Jun 21, 2021 at 4:51 PM Bruce Momjian wrote: >> There were a lot of interesting ideas in this thread and I want to >> analyze some of them. First, there is the common assumption (not >> stated) that over-estimating by 5% and underestimating

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Peter Geoghegan
On Mon, Jun 21, 2021 at 4:51 PM Bruce Momjian wrote: > There were a lot of interesting ideas in this thread and I want to > analyze some of them. First, there is the common assumption (not > stated) that over-estimating by 5% and underestimating by 5% cause the > same harm, which is clearly false

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Bruce Momjian
On Mon, Jun 21, 2021 at 12:52:39PM -0400, Tom Lane wrote: > You're throwing around a lot of pejorative adjectives there without > having bothered to quantify any of them. This sounds less like a sound > argument to me than like a witch trial. > > Reflecting for a bit on the ancient principle that

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Peter Geoghegan
On Mon, Jun 21, 2021 at 1:52 PM Tom Lane wrote: > You're ignoring the fact that the plan shape we generate now is in fact > *optimal*, and easily proven to be so, in some very common cases. As I've said I don't reject the idea that there is room for disagreement on the specifics. For example perh

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tom Lane
Peter Geoghegan writes: > On Mon, Jun 21, 2021 at 10:14 AM Robert Haas wrote: >> The other thing is - the risk of a particular path doesn't matter in >> an absolute sense, only a relative one. In the particular case I'm on >> about here, we *know* there's a less-risky alternative. > Exactly! Thi

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Peter Geoghegan
On Mon, Jun 21, 2021 at 10:14 AM Robert Haas wrote: > Hmm, maybe I need to see an example of the sort of plan shape that you > have in mind. To me it feels like a comparison on a unique key ought > to use a *parameterized* nested loop. And it's also not clear to me > why a nested loop is actually

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tom Lane
Robert Haas writes: > On Mon, Jun 21, 2021 at 1:38 PM Tom Lane wrote: >> NestLoop Join >> Join Filter: t1.x op t2.y >> -> Index Scan on t1_pkey >>Index Cond: t1.id = constant >> -> Seq Scan on t2 > Hmm, yeah, I guess that's possible. How much do you think this loses > as compared w

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Robert Haas
On Mon, Jun 21, 2021 at 1:38 PM Tom Lane wrote: > > Hmm, maybe I need to see an example of the sort of plan shape that you > > have in mind. To me it feels like a comparison on a unique key ought > > to use a *parameterized* nested loop. > > The unique-key comparison would be involved in the outer

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tom Lane
I wrote: > ... As an example, > select * from t1, t2 where t1.id = constant and t1.x op t2.y; > where I'm not assuming much about the properties of "op". > This could be amenable to a plan like > NestLoop Join > Join Filter: t1.x op t2.y > -> Index Scan on t1_pkey >

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tom Lane
Robert Haas writes: > On Mon, Jun 21, 2021 at 11:55 AM Tom Lane wrote: >> There are certainly cases where the optimizer can prove (in principle; >> it doesn't do so today) that a plan node will produce at most one row. >> They're hardly uncommon either: an equality comparison on a unique >> key,

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Robert Haas
On Mon, Jun 21, 2021 at 1:11 PM Peter Geoghegan wrote: > On Mon, Jun 21, 2021 at 9:52 AM Tom Lane wrote: > > > I'm not so sure that it is. The point isn't the risk, even if it could > > > be calculated. The point is the downsides of being wrong are huge and > > > pretty much unbounded, whereas th

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Robert Haas
On Mon, Jun 21, 2021 at 11:55 AM Tom Lane wrote: > There are certainly cases where the optimizer can prove (in principle; > it doesn't do so today) that a plan node will produce at most one row. > They're hardly uncommon either: an equality comparison on a unique > key, or a subquery with a simple

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Peter Geoghegan
On Mon, Jun 21, 2021 at 9:52 AM Tom Lane wrote: > > I'm not so sure that it is. The point isn't the risk, even if it could > > be calculated. The point is the downsides of being wrong are huge and > > pretty much unbounded, whereas the benefits of being right are tiny > > and bounded. It almost do

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tom Lane
Tomas Vondra writes: > I've been thinking about this a bit recently and searching for papers > talking about this, and but it's not clear to me how to calculate the > risk (and propagate it through the plan) without making the whole cost > evaluation way more complicated / expensive :-( Yeah,

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tom Lane
Peter Geoghegan writes: > On Mon, Jun 21, 2021 at 8:55 AM Tom Lane wrote: >> I'd be a lot happier if this proposal were couched around some sort >> of estimate of the risk of the outer side producing more than the >> expected number of rows. The arguments so far seem like fairly lame >> rational

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tomas Vondra
On 6/21/21 5:55 PM, Tom Lane wrote: Peter Geoghegan writes: The heuristic that has the optimizer flat out avoids an unparameterized nested loop join is justified by the belief that that's fundamentally reckless. Even though we all agree on that much, I don't know when it stops being reckless

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Peter Geoghegan
On Mon, Jun 21, 2021 at 8:55 AM Tom Lane wrote: > There are certainly cases where the optimizer can prove (in principle; > it doesn't do so today) that a plan node will produce at most one row. > They're hardly uncommon either: an equality comparison on a unique > key, or a subquery with a simple

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Tom Lane
Peter Geoghegan writes: > The heuristic that has the optimizer flat out avoids an > unparameterized nested loop join is justified by the belief that > that's fundamentally reckless. Even though we all agree on that much, > I don't know when it stops being reckless and starts being "too risky > for

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Peter Geoghegan
On Mon, Jun 21, 2021 at 7:45 AM Robert Haas wrote: > On Mon, Jun 21, 2021 at 6:41 AM David Rowley wrote: > > For example, in an unparameterized Nested Loop that estimates the > > outer Path to have 1 row will cost an entire additional inner cost if > > there are 2 rows. With Hash Join the cost i

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Robert Haas
On Mon, Jun 21, 2021 at 6:41 AM David Rowley wrote: > For example, in an unparameterized Nested Loop that estimates the > outer Path to have 1 row will cost an entire additional inner cost if > there are 2 rows. With Hash Join the cost is just an additional > hashtable lookup, which is dead cheap

Re: disfavoring unparameterized nested loops

2021-06-21 Thread Kenneth Marshall
> > > > Most of the time when I see that happen it's down to either the > > selectivity of some correlated base-quals being multiplied down to a > > number low enough that we clamp the estimate to be 1 row. The other > > case is similar, but with join quals. > > If an estimate is lower than 1, t

Re: disfavoring unparameterized nested loops

2021-06-21 Thread John Naylor
On Tue, Jun 15, 2021 at 8:00 PM David Rowley wrote: > In my experience, the most common reason that the planner chooses > non-parameterized nested loops wrongly is when there's row > underestimation that says there's just going to be 1 row returned by > some set of joins. The problem often comes

Re: disfavoring unparameterized nested loops

2021-06-21 Thread David Rowley
On Wed, 16 Jun 2021 at 15:08, Peter Geoghegan wrote: > It seems important to distinguish between risk and uncertainty -- > they're rather different things. The short version is that you can > model risk but you cannot model uncertainty. This seems like a problem > of uncertainty to me. You might

Re: disfavoring unparameterized nested loops

2021-06-18 Thread John Naylor
On Fri, Jun 18, 2021 at 6:20 AM Ashutosh Bapat wrote: > I have not come across many papers which leverage this idea. Googling > "selectivity estimation confidence interval", does not yield many > papers. Although I found [1] to be using a similar idea. So may be > there's not merit in this idea,

Re: disfavoring unparameterized nested loops

2021-06-18 Thread Ashutosh Bapat
> > The problem I have with this idea is that I really don't know how to > properly calculate what the risk_factor should be set to. It seems > easy at first to set it to something that has the planner avoid these > silly 1-row estimate nested loop mistakes, but I think what we'd set > the risk_fa

Re: disfavoring unparameterized nested loops

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 5:00 PM David Rowley wrote: > Most of the time when I see that happen it's down to either the > selectivity of some correlated base-quals being multiplied down to a > number low enough that we clamp the estimate to be 1 row. The other > case is similar, but with join qual

Re: disfavoring unparameterized nested loops

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 6:15 PM David Rowley wrote: > On Wed, 16 Jun 2021 at 12:11, Peter Geoghegan wrote: > > Whether or not we throw the plan back at the planner or "really change > > our minds at execution time" seems like a distinction without a > > difference. > > What is "really change our

Re: disfavoring unparameterized nested loops

2021-06-15 Thread David Rowley
On Wed, 16 Jun 2021 at 12:11, Peter Geoghegan wrote: > Whether or not we throw the plan back at the planner or "really change > our minds at execution time" seems like a distinction without a > difference. What is "really change our minds at execution time"? Is that switch to another plan withou

Re: disfavoring unparameterized nested loops

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 5:00 PM David Rowley wrote: > I don't really think we should solve this by having nodeNestloop.c > fall back on hashing when the going gets tough. Overloading our nodes > that way is not a sustainable thing to do. I'd rather see the > executor throw the plan back at the p

Re: disfavoring unparameterized nested loops

2021-06-15 Thread David Rowley
On Wed, 16 Jun 2021 at 05:09, Robert Haas wrote: > How to do that is not very clear. One very simple thing we could do > would be to introduce enable_nestloop=parameterized or > enable_parameterized_nestloop=false. That is a pretty blunt instrument > but the authors of the paper seem to have done

Re: disfavoring unparameterized nested loops

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 12:31 PM Robert Haas wrote: > Yes, I think it is. Reading the paper really helped me crystallize my > thoughts about this, because when I've studied the problems myself, I > came, as you postulate here, to the conclusion that there's a lot of > stuff the planner does where

Re: disfavoring unparameterized nested loops

2021-06-15 Thread Robert Haas
On Tue, Jun 15, 2021 at 2:04 PM Peter Geoghegan wrote: > I guess that there is a hesitation to not introduce heuristics like > this because it doesn't fit into some larger framework that captures > risk -- it might be seen as an ugly special case. But isn't this > already actually kind of special,

Re: disfavoring unparameterized nested loops

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 10:09 AM Robert Haas wrote: > How to do that is not very clear. One very simple thing we could do > would be to introduce enable_nestloop=parameterized or > enable_parameterized_nestloop=false. That is a pretty blunt instrument > but the authors of the paper seem to have do

disfavoring unparameterized nested loops

2021-06-15 Thread Robert Haas
>From time to time, someone tells me that they've configured enable_nestloop=false on postgresql.conf, which is a pretty bad idea since there are a significant number of cases where such plans are the only reasonable way of executing some query. However, it's no great secret that PostgreSQL's optim