Re: disfavoring unparameterized nested loops

Bruce Momjian Mon, 21 Jun 2021 16:51:42 -0700

On Mon, Jun 21, 2021 at 12:52:39PM -0400, Tom Lane wrote:
> You're throwing around a lot of pejorative adjectives there without
> having bothered to quantify any of them.  This sounds less like a sound
> argument to me than like a witch trial.
>
> Reflecting for a bit on the ancient principle that "the only good numbers
> in computer science are 0, 1, and N", it seems to me that we could make
> an effort to classify RelOptInfos as provably empty, provably at most one
> row, and others.  (This would be a property of relations, not individual
> paths, so it needn't bloat struct Path.)  We already understand about
> provably-empty rels, so this is just generalizing that idea a little.
> Once we know about that, at least for the really-common cases like unique
> keys, I'd be okay with a hard rule that we don't consider unparameterized
> nestloop joins with an outer side that falls in the "N" category.
> Unless there's no alternative, of course.
> 
> Another thought that occurs to me here is that maybe we could get rid of
> the enable_material knob in favor of forcing (or at least encouraging)
> materialization when the outer side isn't provably one row.


There were a lot of interesting ideas in this thread and I want to
analyze some of them.  First, there is the common assumption (not
stated) that over-estimating by 5% and underestimating by 5% cause the
same harm, which is clearly false.  If I go to a restaurant and estimate
the bill to be 5% higher or %5 lower, assuming I have sufficient funds,
under or over estimating is probably fine.  If I am driving and
under-estimate the traction of my tires, I am probably fine, but if I
over-estimate their traction by 5%, I might crash.

Closer to home, Postgres is more tolerant of memory usage
under-estimation than over-estimation:

        https://momjian.us/main/blogs/pgblog/2018.html#December_7_2018

What I hear Robert saying is that unparameterized nested loops are much
more sensitive to misestimation than hash joins, and only slightly
faster than hash joins when they use accurate row counts, so we might
want to avoid them by default.  Tom is saying that if we know the outer
side will have zero or one row, we should still do unparameterized
nested loops because those are not more sensitive to misestimation than
hash joins, and slightly faster.

If that is accurate, I think the big question is how common are cases
where the outer side can't be proven to have zero or one row and nested
loops are enough of a win to risk its greater sensitivity to
misestimation.  If it is uncommon, seems we could just code the
optimizer to use hash joins in those cases without a user-visible knob,
beyond the knob that already turns off nested loop joins.

Peter's comment about having nodes in the executor that adjust to the
row counts it finds is interesting, and eventually might be necessary
once we are sure we have gone as far as we can without it.

-- 
  Bruce Momjian  <br...@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: disfavoring unparameterized nested loops

Reply via email to