On Mon, Jun 21, 2021 at 1:11 PM Peter Geoghegan <p...@bowt.ie> wrote: > On Mon, Jun 21, 2021 at 9:52 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > > I'm not so sure that it is. The point isn't the risk, even if it could > > > be calculated. The point is the downsides of being wrong are huge and > > > pretty much unbounded, whereas the benefits of being right are tiny > > > and bounded. It almost doesn't matter what the underlying > > > probabilities are. > > > > You're throwing around a lot of pejorative adjectives there without > > having bothered to quantify any of them. This sounds less like a sound > > argument to me than like a witch trial. > > I'm not sure what you mean by pejorative. Isn't what I said about > downsides and upsides pretty accurate?
Yeah, I don't see why Peter's characterization deserves to be labelled as pejorative here. A hash join or merge join or parameterized nested loop can turn out to be slower than some other algorithm, but all of those approaches have some component that tends to make the asymptotic cost less than the product of the sizes of the inputs. I don't think that's true in absolutely every case; for example, if a merge join has every row duplicated on both sides, it will have to scan every inner tuple once per outer tuple, just like a nested loop, and the other algorithms also are going to degrade toward O(NM) performance in the face of many duplicates. Also, a hash join can be pretty close to that if it needs a shazillion batches. But in normal cases, any algorithm other than an unparameterized nested loop tends to read each input tuple on each side ONCE, so the cost is more like the sum of the input sizes than the product. And there's nothing pejorative in saying that N + M can be less than N * M by an unbounded amount. That's just the facts. -- Robert Haas EDB: http://www.enterprisedb.com