Implement predicate propagation for non-equivalence clauses

Richard Guo Tue, 04 Sep 2018 23:34:53 -0700

Hi,

As we know, current planner will generate additional restriction clauses
from
equivalence clauses. This will generally lower the total cost because some
of
tuples may be filtered out before joins.


In this patch, we are trying to do the similar deduction, from
non-equivalence
clauses, that is, A=B AND f(A) implies A=B AND f(A) and f(B), under some
restrictions on f. It can be turned on/off by GUC
enable_predicate_propagation.
This is ported from Greenplum database.

The idea is to collect the non-equivalence clauses in a dedicated list. Then
for each RestrictInfo in the list, we replace its expression with another
equivalent one and make a new RestrictInfo. This is done after all
equivalence
classes have been built. For example, if a RestrictInfo stands for A>10, and
if A and B are in the same equivalence class, we generate a new
RestrictInfo by
replacing A with B, so we get B>10.

This patch will introduce extra cost for relation scan, due to the cost of
evaluating the new implied quals. Meanwhile, since the extra filter may
reduce
the number of tuples returned by the scan, it may lower the cost of
following
joins. So, whether we will get a better plan depends on the selectivity of
the
implied quals.

Below is a simple demo, in which this patch will deduce an addition qual
(b.i <
10 in the first query and b.i > 10 in the second query).

create table a (i int);
create table b (i int);
insert into a select generate_series(1,10000);
insert into b select generate_series(1,10000);
analyze a;
analyze b;


-- low selectivity

set enable_predicate_propagation to off;
explain select * from a,b where a.i = b.i and a.i < 10;
                          QUERY PLAN
---------------------------------------------------------------
 Hash Join  (cost=170.11..352.70 rows=9 width=8)
   Hash Cond: (b.i = a.i)
   ->  Seq Scan on b  (cost=0.00..145.00 rows=10000 width=4)
   ->  Hash  (cost=170.00..170.00 rows=9 width=4)
         ->  Seq Scan on a  (cost=0.00..170.00 rows=9 width=4)
               Filter: (i < 10)
(6 rows)

set enable_predicate_propagation to on;
gpadmin=# explain select * from a,b where a.i = b.i and a.i < 10;
                          QUERY PLAN
---------------------------------------------------------------
 Nested Loop  (cost=0.00..341.24 rows=1 width=8)
   Join Filter: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..170.00 rows=9 width=4)
         Filter: (i < 10)
   ->  Materialize  (cost=0.00..170.04 rows=9 width=4)
         ->  Seq Scan on b  (cost=0.00..170.00 rows=9 width=4)
               Filter: (i < 10)
(7 rows)



-- high selectivity

set enable_predicate_propagation to off;
gpadmin=# explain select * from a,b where a.i = b.i and a.i > 10;
                            QUERY PLAN
-------------------------------------------------------------------
 Hash Join  (cost=270.00..577.36 rows=9990 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..170.00 rows=9990 width=4)
         Filter: (i > 10)
   ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
         ->  Seq Scan on b  (cost=0.00..145.00 rows=10000 width=4)
(6 rows)

set enable_predicate_propagation to on;
gpadmin=# explain select * from a,b where a.i = b.i and a.i > 10;
                            QUERY PLAN
------------------------------------------------------------------
 Hash Join  (cost=294.88..602.14 rows=9980 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..170.00 rows=9990 width=4)
         Filter: (i > 10)
   ->  Hash  (cost=170.00..170.00 rows=9990 width=4)
         ->  Seq Scan on b  (cost=0.00..170.00 rows=9990 width=4)
               Filter: (i > 10)
(7 rows)

In general, if the selectivity of the implied qual is low, the extra cost
for
relation scan is more likely to be compromised by the cost reduced in join,
and
vise versa.

Thanks
Richard

0001-Implement-predicate-propagation-for-non-equivalence-clauses-v1.patch
Description: Binary data

Implement predicate propagation for non-equivalence clauses

Reply via email to