Richard Guo <ri...@pivotal.io> writes: > On Tue, Jan 8, 2019 at 11:29 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> The bigger picture here, of course, is that check_outerjoin_delay's >> API is not at all well matched to what check_equivalence_delay needs: >> it has to make the extra bms_copy shown above, and it has no use >> for either of the relid sets that check_outerjoin_delay wants to >> return. I believe it's like that because check_outerjoin_delay is >> much older than the EquivalenceClass logic, and I didn't want to >> redesign it when I put in ECs, and I also didn't want to just >> duplicate all that logic. But maybe it's time to put some more thought >> into that, and try to find a refactoring of check_outerjoin_delay that >> suits both callers better.
> I am thinking about whether we can quickly tell a qual is > outerjoin_delayed or not by some bms_overlap operations. > A qual is regarded as being outerjoin_delayed if: > 1) It references any nullable rels of some OJ below this qual, and > 2) We haven't included all the rels of that OJ in relids. > For 1), we can collect all the nullable rels of OJs below this qual in > deconstruct_recurse() (like nullable_baserels, initsplan.c:976) and > bms_overlap this qual's relids with the collected nullable_relids. Yeah, although I'm not sure that there's a good way to produce that info in check_equivalence_delay? > For 2), I haven't figured out a good way. I'm not sure that there's any realistic way to do better given just the existing SpecialJoinInfo data structures. In particular, the fact that we consider the join_info_list to be unordered is a problem for this code --- it means we have to make multiple passes to ensure we've accounted for everything. I deliberately made the SpecialJoinInfo data structure as lean as possible when I first designed it, reasoning that that would minimize hazards from incorrect data structure contents that might force rewriting lots of code. But now that it's been stable a long while, maybe it's time to try to be smarter. For example, maybe it'd be sane to convert the list into a tree and include fields that represent the effects of not only each outer join but all its children. Then you could (in principle) do what check_outerjoin_delay needs to do just by looking at the current tree element. I don't have any concrete thoughts here, though, it's just handwaving. One issue that would need careful consideration is how to not break outer-join reordering rules; probably, you'd need to be sure that whatever derived data you add is defined in a way whereby it doesn't change in the face of legal join reorderings. regards, tom lane