Hi I am checking one customer query where there are some empty tables in a nontrivial query. The fixed estimation on empty tables in Postgres are working very well, but I found another issue.
create table test_a(id int); create table test_b(id int); insert into test_a select generate_series(1,100000); analyze test_a, test_b; with zero row in test_b postgres optimizer uses nested loop postgres=# explain analyze select * from test_a where not exists(select * from test_b where test_a.id=test_b.id); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ Nested Loop Anti Join (cost=0.00..2693.00 rows=99999 width=4) (actual time=0.024..90.530 rows=100000 loops=1) │ │ Join Filter: (test_a.id = test_b.id) │ │ -> Seq Scan on test_a (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.017..8.171 rows=100000 loops=1) │ │ -> Seq Scan on test_b (cost=0.00..0.00 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=100000) │ │ Planning Time: 0.153 ms │ │ Execution Time: 94.331 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (6 rows) but if I add one fake row to test_b, I got hash antijoin insert into test_b values(-1); analyze test_b; postgres=# explain analyze select * from test_a where not exists(select * from test_b where test_a.id=test_b.id); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ Hash Anti Join (cost=1.02..2706.51 rows=99999 width=4) (actual time=0.026..24.474 rows=100000 loops=1) │ │ Hash Cond: (test_a.id = test_b.id) │ │ -> Seq Scan on test_a (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.010..8.522 rows=100000 loops=1) │ │ -> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.008..0.010 rows=1 loops=1) │ │ Buckets: 1024 Batches: 1 Memory Usage: 9kB │ │ -> Seq Scan on test_b (cost=0.00..1.01 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=1) │ │ Planning Time: 0.186 ms │ │ Execution Time: 28.334 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (8 rows) Now the query is almost 3 times faster. Probably this is a cost issue, because cost is very similar. With fake row I got better plan. But when I disable hashjoin I got more expensive but better plan too postgres=# explain analyze select * from test_a where not exists(select * from test_b where test_a.id=test_b.id); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ Nested Loop Anti Join (cost=0.00..2944.01 rows=99999 width=4) (actual time=0.100..47.360 rows=100000 loops=1) │ │ Join Filter: (test_a.id = test_b.id) │ │ Rows Removed by Join Filter: 100000 │ │ -> Seq Scan on test_a (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.019..8.586 rows=100000 loops=1) │ │ -> Materialize (cost=0.00..1.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=100000) │ │ -> Seq Scan on test_b (cost=0.00..1.01 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1) │ │ Planning Time: 0.176 ms │ │ Execution Time: 51.248 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (8 rows) On empty table the Materialize node helps 50%