Hi, As we know, current planner will generate additional restriction clauses from equivalence clauses. This will generally lower the total cost because some of tuples may be filtered out before joins.
In this patch, we are trying to do the similar deduction, from non-equivalence clauses, that is, A=B AND f(A) implies A=B AND f(A) and f(B), under some restrictions on f. It can be turned on/off by GUC enable_predicate_propagation. This is ported from Greenplum database. The idea is to collect the non-equivalence clauses in a dedicated list. Then for each RestrictInfo in the list, we replace its expression with another equivalent one and make a new RestrictInfo. This is done after all equivalence classes have been built. For example, if a RestrictInfo stands for A>10, and if A and B are in the same equivalence class, we generate a new RestrictInfo by replacing A with B, so we get B>10. This patch will introduce extra cost for relation scan, due to the cost of evaluating the new implied quals. Meanwhile, since the extra filter may reduce the number of tuples returned by the scan, it may lower the cost of following joins. So, whether we will get a better plan depends on the selectivity of the implied quals. Below is a simple demo, in which this patch will deduce an addition qual (b.i < 10 in the first query and b.i > 10 in the second query). create table a (i int); create table b (i int); insert into a select generate_series(1,10000); insert into b select generate_series(1,10000); analyze a; analyze b; -- low selectivity set enable_predicate_propagation to off; explain select * from a,b where a.i = b.i and a.i < 10; QUERY PLAN --------------------------------------------------------------- Hash Join (cost=170.11..352.70 rows=9 width=8) Hash Cond: (b.i = a.i) -> Seq Scan on b (cost=0.00..145.00 rows=10000 width=4) -> Hash (cost=170.00..170.00 rows=9 width=4) -> Seq Scan on a (cost=0.00..170.00 rows=9 width=4) Filter: (i < 10) (6 rows) set enable_predicate_propagation to on; gpadmin=# explain select * from a,b where a.i = b.i and a.i < 10; QUERY PLAN --------------------------------------------------------------- Nested Loop (cost=0.00..341.24 rows=1 width=8) Join Filter: (a.i = b.i) -> Seq Scan on a (cost=0.00..170.00 rows=9 width=4) Filter: (i < 10) -> Materialize (cost=0.00..170.04 rows=9 width=4) -> Seq Scan on b (cost=0.00..170.00 rows=9 width=4) Filter: (i < 10) (7 rows) -- high selectivity set enable_predicate_propagation to off; gpadmin=# explain select * from a,b where a.i = b.i and a.i > 10; QUERY PLAN ------------------------------------------------------------------- Hash Join (cost=270.00..577.36 rows=9990 width=8) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..170.00 rows=9990 width=4) Filter: (i > 10) -> Hash (cost=145.00..145.00 rows=10000 width=4) -> Seq Scan on b (cost=0.00..145.00 rows=10000 width=4) (6 rows) set enable_predicate_propagation to on; gpadmin=# explain select * from a,b where a.i = b.i and a.i > 10; QUERY PLAN ------------------------------------------------------------------ Hash Join (cost=294.88..602.14 rows=9980 width=8) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..170.00 rows=9990 width=4) Filter: (i > 10) -> Hash (cost=170.00..170.00 rows=9990 width=4) -> Seq Scan on b (cost=0.00..170.00 rows=9990 width=4) Filter: (i > 10) (7 rows) In general, if the selectivity of the implied qual is low, the extra cost for relation scan is more likely to be compromised by the cost reduced in join, and vise versa. Thanks Richard
0001-Implement-predicate-propagation-for-non-equivalence-clauses-v1.patch
Description: Binary data