Way back in [1] I proposed that we allow NOT IN subqueries to be converted into an anti-join where the subquery cannot return any NULL values. As Tom pointed out to me, I had neglected to consider that the outer side producing NULLs can cause the anti-join plan to produce incorrect results. The difference is that a NOT IN where the subquery returns no results filters nothing, otherwise it filters the nulls, plus the records that exist in the subquery.
More recently over on [2], Jim and Zheng have re-proposed making improvements in this area. Their ideas are slightly different from mine as they propose to add an OR .. IS NULL clause to the join condition to handle the outer side being NULL with empty subquery problem. Before Jim and Zheng's patch arrived I managed to fix the known problems with my 4-year-old patch thinking it would have been welcome, but it seems that's not the case, perhaps due to the differing ideas we have on how this should work. At that time I didn't think the other patch actually existed yet... oops Anyway, I don't really want to drop my patch as I believe what it does is correct and there's debate on the other thread about how good an idea adding these OR clauses to the join quals is... (forces nested loop plan (see [3])), but it appears Jim and Zheng are fairly set on that idea. Hence... I'm moving my patch here, so it can be debated without interfering with the other work that's going on in this area. There has also been some review of my patch in [4], and of course, originally in [1]. The background is really. 1. Seems fine to do this transformation when there are no nulls. 2. We don't want to cost anything to decide on to do the transformation or not, i.e do it regardless, in all possible cases where it's valid to do so. We already do that for NOT EXISTS, no apparent reason to think this case is any different. 3. Need to consider what planner overhead there is from doing this and failing to do the conversion due lack of evidence for no NULLs. I've not done #3, at least not with the latest patch. There's already a CF entry [5] for this patch, although its targeting PG13. The latest patch is attached. [1] https://www.postgresql.org/message-id/CAApHDvqRB-iFBy68%3DdCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/1550706289606-0.p...@n3.nabble.com [3] https://www.postgresql.org/message-id/CAKJS1f_ZwXtzPz6wDpBXgAVYuxforsqpc6hBw05Y6aPGcOONfA%40mail.gmail.com [4] https://www.postgresql.org/message-id/18203.1551543939%40sss.pgh.pa.us [5] https://commitfest.postgresql.org/22/2020/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
not_in_anti_join_v1.3.patch
Description: Binary data