Can you please resolve the doubt mentioned in the below thread? Thanks &Regards
Shubhankar K Zlabs-CStore Member of Technical Staff, Zoho ---- On Mon, 26 May 2025 16:58:12 +0530 Shubhankar Anand Kulkarni <shubhankar...@zohocorp.com> wrote --- Hi Andy, Thanks for the reply. As suggested, I tried using CTE which seems to solve the case. This is the updated plan: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Right Join (cost=2500502930.90..2501167775.90 rows=10000000 width=65) (actual time=1450.743..12200.977 rows=10000000 loops=1) Output: t1.sensitive_data1, t2.column1 Hash Cond: (t2.column1 = t1.sensitive_data1) CTE t2 -> Foreign Scan on public.cipher (cost=0.00..2500163471.33 rows=10000000 width=32) (actual time=0.095..6288.407 rows=10000000 loops=1) Output: aes256_cbc_decrypt(cipher.c1, '\x323032356b6579666f726373746f726561657332353663626374657374696e67'::bytea, '\x323032356373746f72656976666f7261657332353663626374657374696e67'::bytea) Read blocks: 2442 Skipped: 0 File Mode: File Per Table CStore Dir: /home/shubha-18887/Documents/zoho/postgres17/data/cstore_fdw/116838/116931 CStore Table Size: 2975 MB -> CTE Scan on t2 (cost=0.00..200000.00 rows=10000000 width=32) (actual time=0.097..7626.693 rows=10000000 loops=1) Output: t2.column1 -> Hash (cost=136334.57..136334.57 rows=10000000 width=33) (actual time=1448.676..1448.678 rows=10000000 loops=1) Output: t1.sensitive_data1 Buckets: 131072 Batches: 128 Memory Usage: 5999kB -> Foreign Scan on public.benchmark_encytion t1 (cost=0.00..136334.57 rows=10000000 width=33) (actual time=0.143..513.782 rows=10000000 loops=1) Output: t1.sensitive_data1 Read blocks: 2442 Skipped: 0 File Mode: File Per Table CStore Dir: /home/shubha-18887/Documents/zoho/postgres17/data/cstore_fdw/116838/135230 CStore Table Size: 1987 MB Query Identifier: 2330480792947742169 Planning Time: 3.829 ms Execution Time: 12401.482 ms (24 rows) As seen, the expression got pushed down to respective foreign scan, reducing the overall query time reduced significantly, but there is an increase in the memory footprint. Also the thing which you mentioned, I guess the reason would be once we push the function down to the "foreign scan" node, we need to run these function *before any other filter*, which may increase the number of calls of the function. e.g. After looking into the code, it looks like we will first evaluate the qual, if it qualifies then only we will go for the projection. Meaning, even after expression push-down we will evaluate the expression on top filtered rows only. So I don't think it should have any concern in this case. Also for this particular case, I tried playing around create_hashjoin_plan function in createplan.c, to push my hash-clause expression to below foreign-scan. Currently, if there is a single hash clause of form funcExpr op Var, I am appending the funcExpr to pathTarget of respective table and replacing the funcExpr with a var. Is it a right place/approach to check or should we do changes while parsing in deconstruct_jointree(), in distiribute_quals_to_rels we can update the appropriate baserestrictinfo. Pls share your thoughts on the same. Thanks & Regards Shubhankar K ZLabs-CStore ---- On Thu, 22 May 2025 14:21:51 +0530 Andy Fan < mailto:zhihuifan1...@163.com > wrote --- Shubhankar Anand Kulkarni < mailto:shubhankar...@zohocorp.com > writes: Hi, > SELECT sensitive_data1, column1 FROM benchmark_encytion AS t1 LEFT JOIN ( > SELECT aes256_cbc_decrypt( c1, '\x1234' :: > bytea, '\x5678' :: bytea ) AS column1 FROM cipher ) AS t2 ON > t1.sensitive_data1 = t2.column1; > > As you can see in the above Query, the join clause involves the column which > needs to be decrypted first and then joined > with other table and then in projection we need the needed decrypted values > to print as well, in this case the plan > generated by the PG is as mentioned below (refer to the image as well): > > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Hash Right Join (cost=22.74..73.43 rows=1 width=65) > Output: t1.sensitive_data1, aes256_cbc_decrypt(cipher.c1, > '4696e67'::bytea, '6e67'::bytea) > Hash Cond: (aes256_cbc_decrypt(cipher.c1, '4696e67'::bytea, '6e67'::bytea) > = t1.sensitive_data1) > -> Foreign Scan on public.cipher (cost=0.00..50.68 rows=1 width=49) > Output: cipher.c1, cipher.c2, cipher.c3, cipher.c4, cipher.c5, > cipher.c6 > CStore Dir: > /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/116931 > CStore Table Size: 2424 kB > -> Hash (cost=22.72..22.72 rows=1 width=33) > Output: t1.sensitive_data1 > -> Foreign Scan on public.benchmark_encytion t1 (cost=0.00..22.72 > rows=1 width=33) > Output: t1.sensitive_data1 > CStore Dir: > /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/135230 > CStore Table Size: 1268 kB > Query Identifier: 1810637692808683603 > (14 rows) > > My doubt here is, while planing this join, why can't we parse the join clause > and pass the expressions involved there to > the respective scan nodes and use it above that wherever needed as a > Var? I guess the reason would be once we push the function down to the "foreign scan" node, we need to run these function *before any other filter*, which may increase the number of calls of the function. e.g. SELECT udf1(t1.a) FROM t1_1000row t1, t2_1row t2 where t2.fid = t1.id; If we push down the udf1 to the timing of scaning t1, udf1 would be called 1000 times, but without the push down, it is called 1 times in the above case. IIRC, PostgreSQL assumes after the join, the total rows will be less. To your case especially, 1. the number call of aes256_cbc_decrypt will not be increased even we push down, however figuring out this fact needs some work being done in the very early of planing stage. which might be kind of complex. 2. You can simply rewrite your query with materialized cte, I think that probably resolve your issue. WITH t2 MATERIALIZED as (SELECT aes256_cbc_decrypt( c1, '\x1234' ::bytea, '\x5678' :: bytea ) AS column1 FROM cipher) SELECT sensitive_data1, column1 FROM benchmark_encytion AS t1 left join t2 on t1.sensitive_data1 = t2.column1; . For a general case, I do want to share some intermediate result between ExecQual and ExecProject by storing the intermediate result into some special tts_values in TupleTableSlot. e.g. the case is: SELECT costly_udf(f1.a) FROM t1 JOIN t2 WHERE costly_udf(f1.a) = f2.a; In the past I want to use similar idea to bypass some duplicated detoast effort at [1], but Robert thought it was unaccptable, then the project is dead. Your case makes me think about it again. [1] https://www.postgresql.org/message-id/CA%2BTgmoZfpruG%3DVvqeKLiRC95VbbxEyxBm8d1r3YOpaedkQuL4A%40mail.gmail.com -- Best Regards Andy Fan