Peter Geoghegan <p...@bowt.ie> writes: > The heuristic that has the optimizer flat out avoids an > unparameterized nested loop join is justified by the belief that > that's fundamentally reckless. Even though we all agree on that much, > I don't know when it stops being reckless and starts being "too risky > for me, but not fundamentally reckless". I think that that's worth > living with, but it isn't very satisfying.
There are certainly cases where the optimizer can prove (in principle; it doesn't do so today) that a plan node will produce at most one row. They're hardly uncommon either: an equality comparison on a unique key, or a subquery with a simple aggregate function, come to mind. In such cases, not only is this choice not reckless, but it's provably superior to a hash join. So in the end this gets back to the planning risk factor that we keep circling around but nobody quite wants to tackle. I'd be a lot happier if this proposal were couched around some sort of estimate of the risk of the outer side producing more than the expected number of rows. The arguments so far seem like fairly lame rationalizations for not putting forth the effort to do that. regards, tom lane