Re: Expression push down from Join Node to below node.

Shubhankar Anand Kulkarni Fri, 13 Jun 2025 06:05:30 -0700

Hi,


Apologies for delayed response, 
I believe there may be a slight misunderstanding regarding the use case we're 
aiming to improve. Let me clarify it from the beginning.



"""

Example :

SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where udf1(t1.a) = t2.a and 
t1.a > 5;

"""



In this query, the expression udf1(t1.a) appears both in the join condition

and in the top-level projection. Without optimization, udf1(t1.a) would be

evaluated twice: once during join qualification and again during projection.



Our goal is to detect such common expressions and pushes them down to

the projection list of the appropriate lower-level plan node—whether it is

on the outer or inner side of the join—based on where the expression's input

columns originate from.



This ensures that the expression is evaluated only once during execution

and the computed result is reused in both the join condition and the final

projection. This is particularly beneficial when dealing with expensive

computations (e.g., Encrypt/Decrypt UDFs), where reducing redundant

evaluations can significantly improve performance.



Also,if we push the udf1  to foreign scan of t1 the filter ( t1.a > 5 ) will be

evaluated first and then we will call the udf1 for the qualifing rows.



Whereas, in your first reply you stated that we need to call the udf1  prior

to the filter( here t1.a >5 ), hence needed to discuss this point once with you.



>From execScan.c, we can see that the qual is evaluated first and then

the projection is formed.

if (qual == NULL || ExecQual(qual, econtext))

{

     /*

     * Found a satisfactory scan tuple.

     */

     if (projInfo)

     {

          /*

          * Form a projection tuple, store it in the result tuple slot

          * and return it.

          */

          return ExecProject(projInfo);

     }

     else

     {

          /*

          * Here, we aren't projecting, so just return scan tuple.

          */

          return slot;

     }

}




> Hash Cond involves two sides of the relations, how could we push down

> it under one side of it? I can't follow up on this well, have you a draft

> plan / code for your idea?



We are not pushing down the entire hash condition as a whole which is not 
possible.

Instead, we just push those expression, which involves only columns from any 
one side ( inner side or outer side)

of the join.



In our case, udf1(t1.a) is a function expression that only involves

columns from the outer side (t1). Therefore, we can safely push it down to the

outer join path's target list.



Actually in general, we are assuming that it is safe and not a overhead

to push down any such expression involved in hash-condition to the respective

lower node's targetlist, as the expression will be evaluated same number of 
times

as it would have been evaluated without the push down.

This enables us to reuse the result of the expression that's present  in both 
the join condition and the final projection.



If you see in createplan.c, we are calling the following functions:



hashclauses = get_actual_clauses(best_path->path_hashclauses);

/*

* Rearrange hashclauses, if needed, so that the outer variable is always

* on the left.

*/

hashclauses = get_switched_clauses(best_path->path_hashclauses,

   best_path->jpath.outerjoinpath->parent->relids);

OpExpr *clause = (OpExpr *) linitial(hashclauses);





After this, we know that the outer varaiable is on left side and inner is in 
right side

of opExpr so we can recurse into, linitial(clause) and find any FuncExpr ( or 
any other expr node)

and then push it to the outer join-path as shown below:



FuncExpr *fexpr = (FuncExpr *) node;



best_path->jpath.outerjoinpath->pathtarget->exprs =

lappend(best_path->jpath.outerjoinpath->pathtarget->exprs, fexpr);







Hence the tentative draft code would be like:



OpExpr *clause = (OpExpr *) linitial(hashclauses);

node = (Node *) linitial(clause->args)

if(IsA(node, FuncExpr)) {               //for now directly checking for funcexpr



FuncExpr *fexpr = (FuncExpr *) node;

best_path->jpath.outerjoinpath->pathtarget->exprs =

lappend(best_path->jpath.outerjoinpath->pathtarget->exprs, fexpr);

}



The next step would be to replace the funcExpr in this clause, with a new Var



newVar = makeVar(

OUTER_VAR,

list_length(best_path->jpath.outerjoinpath->pathtarget->exprs),

exprType(((Node *) fexpr)), exprTypmod((Node *) fexpr),

exprCollation((Node *) fexpr), 0

            );



linitial(clause->args) = (Node *) newVar; //replace the funcExpr with the new 
Var.



Modified PG PLAN:

Observe the targetlist of cipher table, present in the below query plan,

the decrypt function is coming from the foreign scan of cipher table itself.



QUERY PLAN

-----------------------------------------------------------------------------------------------------------

CStore Version: "5.1.0"

Branch: "check_decrypt_slowness"

Build Date: "Mon May 19 13:27:24 2025"

Executor: Vectorized exec



Hash Left Join (cost=22.74..73.43 rows=1 width=65)

  Output: t1.sensitive_data1, aes256_cbc_decrypt(cipher c1 
,'\x32303235'::bytea, '\x32303235::bytea)

  Hash Cond: (t1.sensitive_data1 = (aes256_cbc_decrypt(cipher.c1, 
"\x32303235'::bytea, '\x32303235::bytea)))

  -> Foreign Scan on public.benchmark encytion t1 (cost=0.00..22.72 rows=1 
width=33)

Output: t1.sensitive_data1

cstore Dir: /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/135230

CStore Table Size: 1268 kB

  -> Hash (cost=22.72.. 22.72 rows=1 width=33)

   Output: cipher. c1, (aes256_cbc_decrypt(cipher.c1, '\x32303235"::bytea, 
'\x32303235:: bytea))

   -> Foreign Scan on public.cipher (cost-0.00..50.68 rows=1 width=49)

Output: cipher.c1, aes256_cbc_decrypt(cipher.c1, '\x32303235'::bytea, 
'\x32303235 :: bytea)

CStore Dir: /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/116931

CStore Table Size: 2424 kB 



Planning Time: 9.436 ms

(19 rows)



Please share your thoughts on this approach hope this clarifies the use case

and the approach we're taking to optimize the query execution. If you have

any further questions or suggestions, please feel free to share.



Thanks and Regards

Shubhankar K.

ZLabs-CStore
Member of Technical Staff, Zoho














---- On Wed, 28 May 2025 23:53:46 +0530 Shubhankar Anand Kulkarni 
<shubhankar...@zohocorp.com> wrote ---



Can you please resolve the doubt mentioned in the below thread?


Thanks &Regards


Shubhankar K
Zlabs-CStore
Member of Technical Staff, Zoho







---- On Mon, 26 May 2025 16:58:12 +0530 Shubhankar Anand Kulkarni < 
mailto:shubhankar...@zohocorp.com > wrote ---












Hi Andy,

Thanks for the reply. As suggested, I tried using CTE which seems to solve the 
case. This is the updated plan: 


                                                                                
               QUERY PLAN                                                       
                                         

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hash Right Join  (cost=2500502930.90..2501167775.90 rows=10000000 width=65) 
(actual time=1450.743..12200.977 rows=10000000 loops=1)

   Output: t1.sensitive_data1, t2.column1

   Hash Cond: (t2.column1 = t1.sensitive_data1)

   CTE t2

     ->  Foreign Scan on public.cipher  (cost=0.00..2500163471.33 rows=10000000 
width=32) (actual time=0.095..6288.407 rows=10000000 loops=1)

           Output: aes256_cbc_decrypt(cipher.c1, 
'\x323032356b6579666f726373746f726561657332353663626374657374696e67'::bytea, 
'\x323032356373746f72656976666f7261657332353663626374657374696e67'::bytea)

           Read blocks: 2442 Skipped: 0

           File Mode: File Per Table

           CStore Dir: 
/home/shubha-18887/Documents/zoho/postgres17/data/cstore_fdw/116838/116931

           CStore Table Size: 2975 MB

   ->  CTE Scan on t2  (cost=0.00..200000.00 rows=10000000 width=32) (actual 
time=0.097..7626.693 rows=10000000 loops=1)

         Output: t2.column1

   ->  Hash  (cost=136334.57..136334.57 rows=10000000 width=33) (actual 
time=1448.676..1448.678 rows=10000000 loops=1)

         Output: t1.sensitive_data1

         Buckets: 131072  Batches: 128  Memory Usage: 5999kB

         ->  Foreign Scan on public.benchmark_encytion t1  
(cost=0.00..136334.57 rows=10000000 width=33) (actual time=0.143..513.782 
rows=10000000 loops=1)

               Output: t1.sensitive_data1

               Read blocks: 2442 Skipped: 0

               File Mode: File Per Table

               CStore Dir: 
/home/shubha-18887/Documents/zoho/postgres17/data/cstore_fdw/116838/135230

               CStore Table Size: 1987 MB

Query Identifier: 2330480792947742169

Planning Time: 3.829 ms

Execution Time: 12401.482 ms

(24 rows)





As seen, the expression got pushed down to respective foreign scan, reducing 
the overall query time reduced significantly, but there is an increase in the 
memory footprint.



Also the thing which you mentioned, 

I guess the reason would be once we push the function down to the "foreign 
scan" 

node, we need to run these function *before any other filter*, which may 

increase the number of calls of the function. e.g.


After looking into the code, it looks like we will first evaluate the qual, if 
it qualifies then only we will go for the projection. Meaning, even after 
expression push-down we will evaluate the expression on top filtered rows only. 
So I don't think it should have any concern in this case.


Also for this particular case, I tried playing around create_hashjoin_plan 
function in createplan.c, to push my hash-clause expression to below 
foreign-scan. Currently, if there is a single hash clause of form funcExpr op 
Var, I am appending the funcExpr to pathTarget of  respective table and 
replacing the funcExpr with a var. Is it a right place/approach to check or 
should we do changes while parsing in deconstruct_jointree(), in 
distiribute_quals_to_rels we can  update the appropriate baserestrictinfo. 
Pls share your thoughts on the same.






Thanks & Regards

Shubhankar K
ZLabs-CStore







---- On Thu, 22 May 2025 14:21:51 +0530 Andy Fan < mailto:zhihuifan1...@163.com 
> wrote ---












Shubhankar Anand Kulkarni < mailto:shubhankar...@zohocorp.com > writes: 
 
Hi, 
 
> SELECT sensitive_data1, column1 FROM benchmark_encytion AS t1 LEFT JOIN ( 
> SELECT aes256_cbc_decrypt( c1, '\x1234' :: 
> bytea, '\x5678' :: bytea ) AS column1 FROM cipher ) AS t2 ON 
> t1.sensitive_data1 = t2.column1; 
> 
> As you can see in the above Query, the join clause involves the column which 
> needs to be decrypted first and then joined 
> with other table and then in projection we need the needed decrypted values 
> to print as well, in this case the plan 
> generated by the PG is as mentioned below (refer to the image as well): 
> 
>                                                                               
>                          QUERY PLAN 
> 
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  
> 
> Hash Right Join  (cost=22.74..73.43 rows=1 width=65) 
>    Output: t1.sensitive_data1, aes256_cbc_decrypt(cipher.c1, 
> '4696e67'::bytea, '6e67'::bytea) 
>    Hash Cond: (aes256_cbc_decrypt(cipher.c1, '4696e67'::bytea, '6e67'::bytea) 
> = t1.sensitive_data1) 
>    ->  Foreign Scan on public.cipher  (cost=0.00..50.68 rows=1 width=49) 
>          Output: cipher.c1, cipher.c2, cipher.c3, cipher.c4, cipher.c5, 
> cipher.c6 
>          CStore Dir: 
> /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/116931 
>          CStore Table Size: 2424 kB 
>    ->  Hash  (cost=22.72..22.72 rows=1 width=33) 
>          Output: t1.sensitive_data1 
>          ->  Foreign Scan on public.benchmark_encytion t1  (cost=0.00..22.72 
> rows=1 width=33) 
>                Output: t1.sensitive_data1 
>                CStore Dir: 
> /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/135230 
>                CStore Table Size: 1268 kB 
> Query Identifier: 1810637692808683603 
> (14 rows) 
> 
> My doubt here is, while planing this join, why can't we parse the join clause 
> and pass the expressions involved there to 
> the respective scan nodes and use it above that wherever needed as a 
> Var? 
 
I guess the reason would be once we push the function down to the "foreign 
scan" 
node, we need to run these function *before any other filter*, which may 
increase the number of calls of the function. e.g. 
 
SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where t2.fid = t1.id; 
 
If we push down the udf1 to the timing of scaning t1, udf1 would be 
called 1000 times, but without the push down, it is called 1 times in 
the above case. IIRC, PostgreSQL assumes after the join, the total rows 
will be less. 
 
To your case especially, 
 
1. the number call of  aes256_cbc_decrypt will not be increased even we 
push down, however figuring out this fact needs some work being done in 
the very early of planing stage. which might be kind of complex. 
 
2. You can simply rewrite your query with materialized cte, I think 
that probably resolve your issue. 
 
WITH t2 MATERIALIZED as 
(SELECT aes256_cbc_decrypt( c1, '\x1234' ::bytea, '\x5678' :: bytea ) AS 
column1 FROM cipher) 
SELECT sensitive_data1, column1 FROM 
benchmark_encytion AS t1 left join t2 
on t1.sensitive_data1 = t2.column1; . 
 
For a general case,  I do want to share some intermediate result between 
ExecQual and ExecProject by storing the intermediate result into some 
special tts_values in TupleTableSlot. e.g. the case is: 
 
SELECT costly_udf(f1.a) FROM t1 JOIN t2 WHERE costly_udf(f1.a) = f2.a; 
 
In the past I want to use similar idea to bypass some duplicated detoast 
effort at [1], but Robert thought it was unaccptable, then the project 
is dead. Your case makes me think about it again. 
 
[1] 
https://www.postgresql.org/message-id/CA%2BTgmoZfpruG%3DVvqeKLiRC95VbbxEyxBm8d1r3YOpaedkQuL4A%40mail.gmail.com
  
 
-- 
Best Regards 
Andy Fan

Re: Expression push down from Join Node to below node.

Reply via email to