Thanks for the input

postgres_fdw seems to bring the entire table even if all I use in the join is 
just the id from the remote table.  I know it is possible to query for the 
missing ids and then perform the delete, but I wonder why all types of joins 
are so inefficient.


   DELETE FROM tbl_local lcl
   WHERE  NOT EXISTS (
   SELECT id FROM tbl_link lnk
   WHERE lnk.id = lcl.id   );


"Delete on tbl_local lcl  (cost=114.59..122.14 rows=3 width=730) (actual 
time=62153.636..62153.639 rows=0 loops=1)"
"  ->  Hash Anti Join  (cost=114.59..122.14 rows=3 width=730) (actual 
time=62153.633..62153.636 rows=0 loops=1)"
"        Hash Cond: (lcl.id = lnk.id)"
"        ->  Seq Scan on tbl_local lcl  (cost=0.00..7.11 rows=111 width=14) 
(actual time=0.022..0.062 rows=111 loops=1)"
"        ->  Hash  (cost=113.24..113.24 rows=108 width=732) (actual 
time=55984.489..55984.490 rows=112 loops=1)"
"              Buckets: 1024 (originally 1024)  Batches: 32 (originally 1)  
Memory Usage: 240024kB"
"              ->  Foreign Scan on tbl_link lnk  (cost=100.00..113.24 rows=108 
width=732) (actual time=48505.926..51893.668 rows=112 loops=1)"
"Planning Time: 0.237 ms"
"Execution Time: 62184.253 ms"

From: Vijaykumar Jain [mailto:vijaykumarjain.git...@gmail.com]
Sent: Thursday, January 6, 2022 2:53 PM
To: Avi Weinberg <a...@gilat.com>
Cc: Kyotaro Horiguchi <horikyota....@gmail.com>; pgsql-performa. 
<pgsql-performa...@postgresql.org>
Subject: Re: Same query 10000x More Time


On Thu, Jan 6, 2022, 3:50 PM Avi Weinberg 
<a...@gilat.com<mailto:a...@gilat.com>> wrote:
Hi Kyotaro Horiguchi and Vijaykumar Jain,

Thanks for your quick reply!

I understand that the fact the slow query has a join caused this problem.  
However, why can't Postgres evaluate the table of the "IN" clause (select 140 
as id union select 144  union select 148) and based on its size decide what is 
more optimal.
Push the local table to the linked server to perform the join on the linked 
server
Pull the linked server table to local to perform the join on the local.

In my case the table size of the local is million times smaller than the table 
size of the remote.

I understand when the optimizer makes a decision it uses stats to use the least 
expensive plan to get the result.
I can reply but I am pretty sure making an analogy to a local setup of big and 
small table is not the same as small local table and a big remote table.
I would leave it to the experts here unless you  are open to read the src for 
postgres_fdw extension.
https://github.com/postgres/postgres/blob/master/contrib/postgres_fdw/postgres_fdw.c<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpostgres%2Fpostgres%2Fblob%2Fmaster%2Fcontrib%2Fpostgres_fdw%2Fpostgres_fdw.c&data=04%7C01%7Caviw%40gilat.com%7Cd57dda52c4594051c3fe08d9d1138309%7C7300b1a3573a401092a61c65cd85e927%7C0%7C0%7C637770704011750683%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KMdiTS%2FaX%2Fi%2B4I80DjL2g2xbmY3kFCUyMli%2BNpWwlBM%3D&reserved=0>


There must be a reason if that is beyond cost calculation as to why this 
happens.
Else if this is all just cost based, you can try tweaking the cost params and 
see if you can get a better plan.

For exp, if you force parallel cost to 0 on the foreign server, it may use 
parallel workers and do some speed up, but given my exp, fighting optimizer is 
mostly asking for trouble :)
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.

Reply via email to