On Mon, Mar 2, 2026 at 9:50 PM David Geier <[email protected]> wrote: > The very last rewrite combines both cases. The rewritten query then > looks like: > > SELECT t1.c1 FROM T1 WHERE > t1.c1 IS NOT NULL AND > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND > NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL)
I'm still not convinced this rewrite is correct. As I mentioned earlier, it breaks down if t2 is empty while t1 contains NULL rows. For example: CREATE TABLE t1 (c1 int); CREATE TABLE t2 (c1 int); INSERT INTO t1 VALUES (1), (NULL); SELECT t1.c1 FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c1 FROM t2); c1 ---- 1 (2 rows) SELECT t1.c1 FROM T1 WHERE t1.c1 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL); c1 ---- 1 (1 row) > Seems reasonable to start with the non-NULL variant, though there are > certainly cases where there's no PK / unique index on the relevant columns. Yeah. I don't know how to optimize nullable NOT IN clauses. It seems quite difficult to handle safely purely via query transformations. Maybe we can explore adding a dedicated Null-Aware Anti-Join execution node, much like Oracle's approach. But that is definitely beyond the scope of this current patch. - Richard
