On Thu, 6 Jan 2022 at 20:01, Avi Weinberg <a...@gilat.com> wrote: > Thanks for the input > > > > postgres_fdw seems to bring the entire table even if all I use in the join > is just the id from the remote table. I know it is possible to query for > the missing ids and then perform the delete, but I wonder why all types of > joins are so inefficient. > > >
just for fun, please do not do this. I tried out multiple options where we join a small local table to a huge remote table with multiple plan skip settings. postgres@db:~/playground$ psql psql (14beta1) Type "help" for help. postgres=# \c localdb You are now connected to database "localdb" as user "postgres". localdb=# \x Expanded display is on. localdb=# table pg_foreign_server; -[ RECORD 1 ]----------------------------------------------------------------------------------------------- oid | 85462 srvname | remote_server srvowner | 10 srvfdw | 85458 srvtype | srvversion | srvacl | srvoptions | {dbname=remotedb,use_remote_estimate=true,fdw_startup_cost=0,fdw_tuple_cost=0,fetch_size=10000} localdb=# \x Expanded display is off. localdb=# \dt List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | t | table | postgres (1 row) localdb=# \det remote_schema.remote_table; List of foreign tables Schema | Table | Server ---------------+--------------+--------------- remote_schema | remote_table | remote_server (1 row) localdb=# \c remotedb; You are now connected to database "remotedb" as user "postgres". remotedb=# \dt List of relations Schema | Name | Type | Owner --------+--------------+-------+---------- public | remote_table | table | postgres (1 row) remotedb=# select count(1) from remote_table; count -------- 100000 (1 row) remotedb=# \c localdb You are now connected to database "localdb" as user "postgres". localdb=# select count(1) from t; count ------- 10 (1 row) *# all the set options are forcing the optmizer to skip that plan route* localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=1.23..2817.97 rows=100000 width=16) (actual time=5.814..63.310 rows=90000 loops=1) Output: t.t_id, t.t_col, r.rt_id, r.t_id Inner Unique: true Hash Cond: (r.t_id = t.t_id) -> Foreign Scan on remote_schema.remote_table r (cost=0.00..2443.00 rows=100000 width=8) (actual time=5.797..47.329 rows=100000 loops=1) Output: r.rt_id, r.t_id *Remote SQL: SELECT rt_id, t_id FROM public.remote_table* -> Hash (cost=1.10..1.10 rows=10 width=8) (actual time=0.009..0.010 rows=10 loops=1) Output: t.t_id, t.t_col Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on public.t (cost=0.00..1.10 rows=10 width=8) (actual time=0.005..0.006 rows=10 loops=1) Output: t.t_id, t.t_col Planning Time: 4.464 ms Execution Time: 65.995 ms (14 rows) localdb=# set enable_seqscan TO 0; SET localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=12.41..2829.16 rows=100000 width=16) (actual time=5.380..61.028 rows=90000 loops=1) Output: t.t_id, t.t_col, r.rt_id, r.t_id Inner Unique: true Hash Cond: (r.t_id = t.t_id) -> Foreign Scan on remote_schema.remote_table r (cost=0.00..2443.00 rows=100000 width=8) (actual time=5.362..45.625 rows=100000 loops=1) Output: r.rt_id, r.t_id *Remote SQL: SELECT rt_id, t_id FROM public.remote_table* -> Hash (cost=12.29..12.29 rows=10 width=8) (actual time=0.011..0.011 rows=10 loops=1) Output: t.t_id, t.t_col Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Scan using t_pkey on public.t (cost=0.14..12.29 rows=10 width=8) (actual time=0.005..0.008 rows=10 loops=1) Output: t.t_id, t.t_col Planning Time: 0.696 ms Execution Time: 63.666 ms (14 rows) localdb=# set enable_hashjoin TO 0; SET localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.15..4821.93 rows=100000 width=16) (actual time=5.199..75.817 rows=90000 loops=1) Output: t.t_id, t.t_col, r.rt_id, r.t_id Inner Unique: true -> Foreign Scan on remote_schema.remote_table r (cost=0.00..2443.00 rows=100000 width=8) (actual time=5.186..46.152 rows=100000 loops=1) Output: r.rt_id, r.t_id *Remote SQL: SELECT rt_id, t_id FROM public.remote_table* -> Result Cache (cost=0.15..0.16 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=100000) Output: t.t_id, t.t_col Cache Key: r.t_id Hits: 99990 Misses: 10 Evictions: 0 Overflows: 0 Memory Usage: 2kB -> Index Scan using t_pkey on public.t (cost=0.14..0.15 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=10) Output: t.t_id, t.t_col Index Cond: (t.t_id = r.t_id) Planning Time: 0.692 ms Execution Time: 78.512 ms (15 rows) localdb=# set enable_resultcache TO 0; SET localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=479.10..5847.98 rows=100000 width=16) (actual time=12.855..66.094 rows=90000 loops=1) Output: t.t_id, t.t_col, r.rt_id, r.t_id Inner Unique: true Merge Cond: (r.t_id = t.t_id) -> Foreign Scan on remote_schema.remote_table r (cost=0.29..4586.89 rows=100000 width=8) (actual time=6.235..55.329 rows=100000 loops=1) Output: r.rt_id, r.t_id *Remote SQL: SELECT rt_id, t_id FROM public.remote_table ORDER BY t_id ASC NULLS LAST* -> Index Scan using t_pkey on public.t (cost=0.14..12.29 rows=10 width=8) (actual time=0.006..0.024 rows=9 loops=1) Output: t.t_id, t.t_col Planning Time: 0.704 ms Execution Time: 68.724 ms (11 rows) localdb=# set enable_mergejoin TO 0; SET localdb=# explain (analyze, verbose) select * from t join remote_schema.remote_table r on (t.t_id = r.t_id); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=113.94..8830.28 rows=100000 width=16) (actual time=11.576..89.465 rows=90000 loops=1) Output: t.t_id, t.t_col, r.rt_id, r.t_id -> Index Scan using t_pkey on public.t (cost=0.14..12.29 rows=10 width=8) (actual time=0.009..0.034 rows=10 loops=1) Output: t.t_id, t.t_col -> Foreign Scan on remote_schema.remote_table r (cost=113.80..781.80 rows=10000 width=8) (actual time=7.648..8.108 rows=9000 loops=10) Output: r.rt_id, r.t_id *Remote SQL: SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id))* Planning Time: 0.667 ms Execution Time: 92.131 ms (9 rows) from the logs for the last case: (it has open a new cursor everytime for each matching id) and is still the *slowest.* 2022-01-06 22:10:48.665 IST [2318] LOG: execute <unnamed>: DECLARE c1 CURSOR FOR SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id)) 2022-01-06 22:10:48.665 IST [2318] DETAIL: parameters: $1 = '1' 2022-01-06 22:10:48.665 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.679 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.679 IST [2318] LOG: statement: CLOSE c1 2022-01-06 22:10:48.679 IST [2318] LOG: execute <unnamed>: DECLARE c1 CURSOR FOR SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id)) 2022-01-06 22:10:48.679 IST [2318] DETAIL: parameters: $1 = '2' 2022-01-06 22:10:48.679 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.686 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.687 IST [2318] LOG: statement: CLOSE c1 2022-01-06 22:10:48.687 IST [2318] LOG: execute <unnamed>: DECLARE c1 CURSOR FOR SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id)) 2022-01-06 22:10:48.687 IST [2318] DETAIL: parameters: $1 = '3' 2022-01-06 22:10:48.687 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.698 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.698 IST [2318] LOG: statement: CLOSE c1 2022-01-06 22:10:48.698 IST [2318] LOG: execute <unnamed>: DECLARE c1 CURSOR FOR SELECT rt_id, t_id FROM public.remote_table WHERE (($1::integer = t_id)) 2022-01-06 22:10:48.698 IST [2318] DETAIL: parameters: $1 = '4' 2022-01-06 22:10:48.698 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.705 IST [2318] LOG: statement: FETCH 10000 FROM c1 2022-01-06 22:10:48.705 IST [2318] LOG: statement: CLOSE c1 so i think, i just trust the optimizer, or rewrite my so as to gather the predicate first locally and then pass them to remote, or use materialized views to maintain a stale copy of the remote table on my local db etc.