Hi I am having a rare issue with extremely inefficient merge join. The query plan indicates that PG is doing some kind of nested loop, although an index is present.
PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit Schema of dir_current (some columns left out for brevity): starfish=# \d sf.dir_current Table "sf.dir_current" Column | Type | Collation | Nullable | Default --------------------+-------------+-----------+----------+----------------------------------------------- id | bigint | | not null | nextval('sf.object_id_seq'::regclass) volume_id | bigint | | not null | parent_id | bigint | | | blocks | sf.blkcnt_t | | | rec_aggrs | jsonb | | not null | Indexes: "dir_current_pk" PRIMARY KEY, btree (id), tablespace "sf_current" "dir_current_parentid_idx" btree (parent_id), tablespace "sf_current" "dir_current_volume_id_out_of_sync_time_idx" btree (volume_id, out_of_sync_time) WHERE out_of_sync_time IS NOT NULL, tablespace "sf_current" "dir_current_volume_id_path_unq_idx" UNIQUE, btree (volume_id, path text_pattern_ops), tablespace "sf_current" "dir_current_volumeid_id_unq" UNIQUE CONSTRAINT, btree (volume_id, id), tablespace "sf_current" Foreign-key constraints: "dir_current_parentid_fk" FOREIGN KEY (parent_id) REFERENCES sf.dir_current(id) DEFERRABLE INITIALLY DEFERRED dir_process is created as a temporary table: CREATE TEMP TABLE dir_process AS ( SELECT sf.dir_current.id, volume_id, parent_id, depth, size, blocks, atime, ctime, mtime, sync_time, local_aggrs FROM sf.dir_current WHERE .... ); CREATE INDEX dir_process_indx ON dir_process(volume_id, id); ANALYZE dir_process; and usually contains a few thousands rows. Slow query: SELECT dir.id, dir.volume_id, dir.parent_id, dir.rec_aggrs, dir.blocks FROM sf.dir_current AS dir INNER JOIN dir_process ON dir.parent_id = dir_process.id AND dir.volume_id = dir_process.volume_id WHERE dir.volume_id = ANY(volume_ids) dir_current contains around 750M rows altogether, and there is ca. 1.75M rows with volume_id=5. Sometimes Postgres will choose very inefficient plan, which involves looping many times over same rows, producing hundreds of millions or billions of rows: LOG: duration: 1125530.496 ms plan: Merge Join (cost=909.42..40484.01 rows=1 width=456) (actual rows=1 loops=1) Merge Cond: (dir.volume_id = dir_process.volume_id) Join Filter: (dir.parent_id = dir_process.id) Rows Removed by Join Filter: 13583132483 -> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=2000756 loops=1) Index Cond: (volume_id = ANY ('{5}'::bigint[])) -> Sort (cost=909.31..912.70 rows=6789 width=16) (actual rows=13581131729 loops=1) Sort Key: dir_process.volume_id Sort Method: quicksort Memory: 511kB -> Seq Scan on dir_process (cost=0.00..822.89 rows=6789 width=16) (actual rows=6789 loops=1) LOG: duration: 3923310.224 ms plan: Merge Join (cost=0.17..4324.64 rows=1 width=456) (actual rows=529 loops=1) Merge Cond: (dir_process.volume_id = dir.volume_id) Join Filter: (dir.parent_id = dir_process.id) Rows Removed by Join Filter: 831113021 -> Index Only Scan using dir_process_indx on dir_process (cost=0.05..245.00 rows=450 width=16) (actual rows=450 loops=1) Heap Fetches: 450 -> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=831113101 loops=1) Index Cond: (volume_id = ANY ('{5}'::bigint[])) LOG: duration: 10140968.829 ms plan: Merge Join (cost=0.17..8389.13 rows=1 width=456) (actual rows=819 loops=1) Merge Cond: (dir_process.volume_id = dir.volume_id) Join Filter: (dir.parent_id = dir_process.id) Rows Removed by Join Filter: 2153506735 -> Index Only Scan using dir_process_indx on dir_process (cost=0.06..659.76 rows=1166 width=16) (actual rows=1166 loops=1) Heap Fetches: 1166 -> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..885276.20 rows=602172 width=456) (actual rows=2153506389 loops=1) Index Cond: (volume_id = ANY ('{5}'::bigint[])) Note that 2153506389 / 1166 = 1 846 918. Similarly 831113101 / 450 = 1 846 918. I wonder how I can help Postgres query planner to choose a faster plan? -- Marcin Gozdalik