Hi all,

     consider the following SQL:

================================================================================================
   gpadmin=# explain (verbose, costs off)
     select * from t,
                             (select a from generate_series(1, 1)a)x,
                             (select a from generate_series(1, 1)a)y
      where ((x.a+y.a)/4.0) > random();
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Nested Loop
   Output: t.a, t.b, a.a, a_1.a
   ->  Nested Loop
         Output: a.a, a_1.a
         Join Filter: (((((a.a + a_1.a))::numeric / 4.0))::double precision > 
random())
         ->  Function Scan on pg_catalog.generate_series a
               Output: a.a
               Function Call: generate_series(1, 1)
         ->  Function Scan on pg_catalog.generate_series a_1
               Output: a_1.a
               Function Call: generate_series(1, 1)
   ->  Seq Scan on public.t
         Output: t.a, t.b
(13 rows)

================================================================================================

        The where clause is "pushed down to the x,y" because it only references 
these two relations.

        The original query tree's join tree is like:
FromExpr []
        [fromlist]
                RangeTblRef [rtindex=1]
                RangeTblRef [rtindex=4]
                RangeTblRef [rtindex=5]
        [quals]
                OpExpr [opno=674 opfuncid=297 opresulttype=16 opretset=false]
                        FuncExpr [funcid=1746 funcresulttype=701 
funcretset=false funcvariadic=false
                                  funcformat=COERCE_IMPLICIT_CAST]
                                OpExpr [opno=1761 opfuncid=1727 
opresulttype=1700 opretset=false]
                                        FuncExpr [funcid=1740 
funcresulttype=1700 funcretset=false funcvariadic=false
                                                         
funcformat=COERCE_IMPLICIT_CAST]
                                                OpExpr [opno=551 opfuncid=177 
opresulttype=23 opretset=false]
                                                        Var [varno=4 varattno=1 
vartype=23 varnoold=4 varoattno=1]
                                                        Var [varno=5 varattno=1 
vartype=23 varnoold=5 varoattno=1]
                                        Const [consttype=1700 constlen=-1 
constvalue=94908966309104 constisnull=false
                                                     constbyval=false]
                        FuncExpr [funcid=1598 funcresulttype=701 
funcretset=false funcvariadic=false
                                          funcformat=COERCE_EXPLICIT_CALL]

    It seems the semantics it wants to express is:   filter after join all the 
tables.


    Thus maybe a plan like

Nested Loop
 Join Filter: (((((a.a + a_1.a))::numeric / 4.0))::double precision > random())
 ->  Nested Loop
         ->  Function Scan on generate_series a
         ->  Function Scan on generate_series a_1
   ->  Seq Scan on t  (cost=0.00..32.60 rows=2260 width=8)

  May also be reasonable because it is just the direct translation from the 
original query tree.

   The above plans may have different property:
      * the first one, if we push down, can only produce 2 results: 0 rows, or 
10 rows. No third possibility
      * the second one, will output 0 ~ 10 rows with equal probability.


    I am wondering if we should consider volatile functions in restrictinfo 
when try to distribute_restrictinfo_to_rels?


Best,
Zhenghua Lyu

Reply via email to