Hi. In some cases, the planner underestimates FULL JOIN.
Example: postgres=# CREATE TABLE t AS SELECT x AS a, null AS b FROM generate_series(1, 10) x; postgres=# ANALYZE; postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b; QUERY PLAN ------------------------------------------------------------------------------------------- Hash Full Join (cost=1.23..2.37 rows=10 width=72) (actual rows=20 loops=1) Hash Cond: (t1.b = t2.b) -> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1) -> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1) Planning Time: 0.067 ms Execution Time: 0.052 ms (8 rows) Are these simple changes enough to improve this situation? diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index ef475d95a18..9cd43b778f3 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -5259,6 +5259,8 @@ calc_joinrel_size_estimate(PlannerInfo *root, break; case JOIN_FULL: nrows = outer_rows * inner_rows * fkselec * jselec; + if (2 * nrows < outer_rows + inner_rows) + nrows = outer_rows + inner_rows - nrows; if (nrows < outer_rows) nrows = outer_rows; if (nrows < inner_rows) There is no error in the above case: postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b; QUERY PLAN ------------------------------------------------------------------------------------------- Hash Full Join (cost=1.23..2.37 rows=20 width=72) (actual rows=20 loops=1) Hash Cond: (t1.b = t2.b) -> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1) -> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1) Planning Time: 0.069 ms Execution Time: 0.065 ms (8 rows)