Thomas Munro <thomas.mu...@gmail.com> writes: > While looking at bug #15857[1], I wondered why the following two > queries get different plans, given the schema and data from the bug > report: > ... > Here's my question: how could it ever be OK to sort/unique something > and put it in a hash table, but not OK to put exactly the same thing > in the hash table directly, with JOIN_SEMI logic to prevent multiple > matches? And likewise for the inner side of other join strategies.
I think you're thinking about it wrong. Or maybe there's some additional work we could put in to extend the logic. If we're considering the case where the semijoin is c.base_id = a.id, then we definitely can do "a SEMIJOIN c WHERE a.id = c.base_id". If we want to join b to c first, we can do so, but we have to unique-ify c before that join, and then both the b/c join and the later join with a can become plain inner joins. We *don't* get to skip unique-ifying c before joining to b and then apply a semijoin with a later, because in general that's going to result in the wrong number of output rows. (Example: if a is unique but b has multiple copies of a particular join key, and the key does appear in c, we should end with multiple output rows having that key, and we wouldn't.) It's possible that in some situations we could prove that semijoining later would work, but it seems like that'd require a great deal more analysis than the code does now. There'd also have to be tracking of whether the final a join still has to be a semijoin or not, which'd now depend on which path for b/c was being considered. > Or to put it in the language of the comment, how could you ever have > enough rels to do a join between B and unique(C), but not enough rels > to do a semi-join between B and C? If you're not joining C to B at all, but to some other rel A, you can't do it as a semijoin because you can't execute the semijoin qual correctly. In the particular case we're looking at here, it may be possible to prove that equivalence-class substitution from the original B/C semijoin qual gives rise to an A/C join qual that will work as an equivalent semijoin qual, and then it's OK to do A/C as a semijoin with that. But I'm not very sure what the proof conditions need to be, and I'm 100% sure that the code isn't making any such proof now. regards, tom lane