Hello All, We have the below query which is running for ~45 seconds on postgres aurora reader instance. I have captured the explain analyze. Want to understand, where exactly the resources are getting spent and if we can be able to optimize it further. It's a UI query showing top 50 rows and is supposed to finish in sub seconds but it takes around ~45 seconds to finish. Also seeing multiple workers in the plan, does that mean the query is running in parallel somehow?
explain (analyze,verbose,costs,buffers) select TAB1.PRI from SCHEMA1.TAB1 TAB1 inner join SCHEMA1.TAB4 TAB4 on TAB4.PRI = TAB1.PRI inner join SCHEMA1."TAB2" TAB2 on TAB2.PRI = TAB1.PRI inner join SCHEMA1.TAB3 a2 on a2.AID = TAB2.AID where TAB2.MID = XXXXX and TAB4.TAB4_code = 'XX' and TAB2.TAB2_time between '2024-01-01' and '2024-01-31' order by TAB2.TAB2_time desc limit 50; Limit (cost=13052924.01..13052924.14 rows=50 width=45) (actual time=45211.971..45224.720 rows=50 loops=1) " Output: TAB1.PRI, TAB2.TAB2_time" Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 -> Sort (cost=13052924.01..13052924.19 rows=70 width=45) (actual time=45211.969..45224.713 rows=50 loops=1) " Output: TAB1.PRI, TAB2.TAB2_time" Sort Key: TAB2.TAB2_time DESC Sort Method: top-N heapsort Memory: 32kB Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 -> Gather (cost=92917.38..13052921.87 rows=70 width=45) (actual time=947.004..45221.915 rows=5428 loops=1) " Output: TAB1.PRI, TAB2.TAB2_time" Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 -> Nested Loop (cost=91917.38..13051914.87 rows=18 width=45) (actual time=945.946..45195.224 rows=1086 loops=5) " Output: TAB1.PRI, TAB2.TAB2_time" Inner Unique: true Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 Worker 0: actual time=936.808..45193.518 rows=1036 loops=1 Buffers: shared hit=382606 read=465076 I/O Timings: shared/local read=22452.028 Worker 1: actual time=947.246..45194.168 rows=1055 loops=1 Buffers: shared hit=383165 read=484189 I/O Timings: shared/local read=22617.135 Worker 2: actual time=933.623..45192.534 rows=1145 loops=1 Buffers: shared hit=415758 read=473182 I/O Timings: shared/local read=22741.488 Worker 3: actual time=965.639..45193.603 rows=1078 loops=1 Buffers: shared hit=398009 read=449053 I/O Timings: shared/local read=22221.094 -> Nested Loop (cost=91916.81..13051828.80 rows=18 width=81) (actual time=945.917..43729.931 rows=1086 loops=5) " Output: TAB1.PRI, TAB2.TAB2_time, TAB2.AID" Inner Unique: true Join Filter: ((TAB4.PRI)::text = (TAB1.PRI)::text) Buffers: shared hit=1962289 read=2328363 I/O Timings: shared/local read=105246.220 Worker 0: actual time=936.781..43732.652 rows=1036 loops=1 Buffers: shared hit=379077 read=463587 I/O Timings: shared/local read=21008.508 Worker 1: actual time=947.212..43699.507 rows=1055 loops=1 Buffers: shared hit=379573 read=482704 I/O Timings: shared/local read=21142.572 Worker 2: actual time=933.589..43696.710 rows=1145 loops=1 Buffers: shared hit=411836 read=471634 I/O Timings: shared/local read=21266.581 Worker 3: actual time=965.608..43768.535 rows=1078 loops=1 Buffers: shared hit=394288 read=447583 I/O Timings: shared/local read=20814.288 -> Parallel Hash Join (cost=91916.24..13051765.39 rows=18 width=117) (actual time=945.879..42758.939 rows=1086 loops=5) " Output: TAB4.PRI, TAB2.TAB2_time, TAB2.PRI, TAB2.AID" Hash Cond: ((TAB2.PRI)::text = (TAB4.PRI)::text) Buffers: shared hit=1943792 read=2322814 I/O Timings: shared/local read=100496.787 Worker 0: actual time=936.743..42798.247 rows=1036 loops=1 Buffers: shared hit=375573 read=462501 I/O Timings: shared/local read=20094.654 Worker 1: actual time=947.169..42752.987 rows=1055 loops=1 Buffers: shared hit=375975 read=481619 I/O Timings: shared/local read=20216.926 Worker 2: actual time=933.545..42660.854 rows=1145 loops=1 Buffers: shared hit=407956 read=470465 I/O Timings: shared/local read=20252.386 Worker 3: actual time=965.567..42797.288 rows=1078 loops=1 Buffers: shared hit=390609 read=446481 I/O Timings: shared/local read=19863.965 " -> Parallel Bitmap Heap Scan on SCHEMA1.""TAB2"" TAB2 (cost=84860.50..13040301.00 rows=1175611 width=80) (actual time=713.054..26942.082 rows=956249 loops=5)" " Output: TAB2.TAB2_time, TAB2.PRI, TAB2.AID" Recheck Cond: (TAB2.MID = 'XXXXX'::numeric) Rows Removed by Index Recheck: 2137395 Filter: ((TAB2.TAB2_time >= '2024-01-01 00:00:00+00'::timestamp with time zone) AND (TAB2.TAB2_time <= '2024-01-31 00:00:00+00'::timestamp with time zone)) Heap Blocks: exact=5300 lossy=782577 Buffers: shared hit=1651569 read=2245157 I/O Timings: shared/local read=29063.286 Worker 0: actual time=713.040..27006.980 rows=942051 loops=1 Buffers: shared hit=317611 read=447013 I/O Timings: shared/local read=5851.688 Worker 1: actual time=713.047..27065.878 rows=939696 loops=1 Buffers: shared hit=317632 read=466176 I/O Timings: shared/local read=6038.851 Worker 2: actual time=713.027..26894.506 rows=967468 loops=1 Buffers: shared hit=349596 read=454912 I/O Timings: shared/local read=5962.348 Worker 3: actual time=713.091..26826.767 rows=961928 loops=1 Buffers: shared hit=332980 read=430848 I/O Timings: shared/local read=5426.475 -> Bitmap Index Scan on TAB2_idx2 (cost=0.00..83684.89 rows=4702443 width=0) (actual time=688.661..688.661 rows=4781245 loops=1) Index Cond: (TAB2.MID = 'XXXXX'::numeric) Buffers: shared hit=12408 Worker 2: actual time=688.661..688.661 rows=4781245 loops=1 Buffers: shared hit=12408 -> Parallel Hash (cost=7042.63..7042.63 rows=1049 width=37) (actual time=217.987..217.988 rows=27613 loops=5) Output: TAB4.PRI Buckets: 262144 (originally 2048) Batches: 1 (originally 1) Memory Usage: 13936kB Buffers: shared hit=134917 Worker 0: actual time=214.981..214.982 rows=27779 loops=1 Buffers: shared hit=27133 Worker 1: actual time=215.455..215.456 rows=27805 loops=1 Buffers: shared hit=27159 Worker 2: actual time=215.774..215.774 rows=27330 loops=1 Buffers: shared hit=26706 Worker 3: actual time=215.776..215.777 rows=26880 loops=1 Buffers: shared hit=26245 -> Parallel Bitmap Heap Scan on SCHEMA1.TAB4 TAB4 (cost=26.39..7042.63 rows=1049 width=37) (actual time=23.650..201.606 rows=27613 loops=5) Output: TAB4.PRI Recheck Cond: ((TAB4.TAB4_code)::text = 'XX'::text) Rows Removed by Index Recheck: 616610 Heap Blocks: exact=11978 lossy=15624 Buffers: shared hit=134917 Worker 0: actual time=20.627..199.852 rows=27779 loops=1 Buffers: shared hit=27133 Worker 1: actual time=21.065..199.786 rows=27805 loops=1 Buffers: shared hit=27159 Worker 2: actual time=21.445..198.582 rows=27330 loops=1 Buffers: shared hit=26706 Worker 3: actual time=21.470..195.915 rows=26880 loops=1 Buffers: shared hit=26245 -> Bitmap Index Scan on TAB4_idx1 (cost=0.00..25.95 rows=1784 width=0) (actual time=23.938..23.938 rows=138067 loops=1) Index Cond: ((TAB4.TAB4_code)::text = 'XX'::text) Buffers: shared hit=72 -> Index Only Scan using TAB1_pk on SCHEMA1.TAB1 TAB1 (cost=0.57..3.51 rows=1 width=37) (actual time=0.891..0.891 rows=1 loops=5428) Output: TAB1.PRI Index Cond: (TAB1.PRI = (TAB2.PRI)::text) Heap Fetches: 0 Buffers: shared hit=18262 read=5549 I/O Timings: shared/local read=4749.434 Worker 0: actual time=0.899..0.899 rows=1 loops=1036 Buffers: shared hit=3464 read=1086 I/O Timings: shared/local read=913.854 Worker 1: actual time=0.894..0.894 rows=1 loops=1055 Buffers: shared hit=3558 read=1085 I/O Timings: shared/local read=925.646 Worker 2: actual time=0.901..0.901 rows=1 loops=1145 Buffers: shared hit=3840 read=1169 I/O Timings: shared/local read=1014.196 Worker 3: actual time=0.898..0.898 rows=1 loops=1078 Buffers: shared hit=3634 read=1102 I/O Timings: shared/local read=950.323 -> Index Only Scan using TAB3_pk on SCHEMA1.TAB3 a2 (cost=0.57..4.78 rows=1 width=36) (actual time=1.336..1.336 rows=1 loops=5428) Output: a2.AID Index Cond: (a2.AID = (TAB2.AID)::text) Heap Fetches: 1836 Buffers: shared hit=18278 read=7398 I/O Timings: shared/local read=7172.664 Worker 0: actual time=1.393..1.393 rows=1 loops=1036 Buffers: shared hit=3455 read=1473 I/O Timings: shared/local read=1429.250 Worker 1: actual time=1.405..1.405 rows=1 loops=1055 Buffers: shared hit=3531 read=1476 I/O Timings: shared/local read=1464.637 Worker 2: actual time=1.296..1.296 rows=1 loops=1145 Buffers: shared hit=3857 read=1538 I/O Timings: shared/local read=1465.583 Worker 3: actual time=1.309..1.309 rows=1 loops=1078 Buffers: shared hit=3642 read=1459 I/O Timings: shared/local read=1395.946 Query Identifier: 7231829541130579109 Planning: Buffers: shared hit=1414 Planning Time: 1.305 ms Execution Time: 45224.792 ms