On Mon, Jun 21, 2021 at 6:41 AM David Rowley <dgrowle...@gmail.com> wrote: > For example, in an unparameterized Nested Loop that estimates the > outer Path to have 1 row will cost an entire additional inner cost if > there are 2 rows. With Hash Join the cost is just an additional > hashtable lookup, which is dead cheap. I don't know exactly how > add_path() would weigh all that up, but it seems to me that I wouldn't > take the risk unless I was 100% certain that the Nested Loop's outer > Path would only return 1 row exactly, if there was any chance at all > it could return more, I'd be picking some other join method.
It seems like everyone agrees that it would be good to do something about this problem, but the question is whether it's best to do something that tries to be general, or whether we should instead do something about this specific case. I favor the latter approach. Risk and uncertainty exist all over the place, but dealing with that in a general way seems difficult, and maybe unnecessary. Addressing the case of unparameterized nest loops specifically seems simpler, because it's easier to reason about what the alternatives are. Your last sentence here seems right on point to me. Basically, what you argue for there is disabling unparameterized nested loops entirely except when we can prove that the inner side will never generate more than one row. But, that's almost never going to be something that we can prove. If the inner side is coming from a table or sub-join, it can turn out to be big. As far as I can see, the only way that this doesn't happen is if it's something like a subquery that aggregates everything down to one row, or has LIMIT 1, but those are such special cases that I don't even think we should be worrying about them. So my counter-proposal is: how about if we split merge_unsorted_outer() into two functions, one of which generates nested loops only based on parameterized paths and the other of which generates nested loops based only on unparameterized paths, and then rejigger add_paths_to_joinrel() so that we do the latter between the steps that are currently number 5 and 6 and only if we haven't got any other paths yet? If somebody later comes up with logic for proving that the inner side can never have more than 1 row, we can let this be run in those cases as well. In the meantime, if somebody does something like a JOIN b ON a.x < b.x, we will still generate these paths because there's no other approach, or similarly if it's a.x = b.x but for some strange type that doesn't have a hash-joinable or merge-joinable equality operator. But otherwise we omit those paths entirely. -- Robert Haas EDB: http://www.enterprisedb.com