Thank you very much for your prompt responses. I have analysed more regarding this and found the long running query.
I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second) Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server. EXPLAIN ANALYZE Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual time=10568.217..22945.971 rows=22 loops=1) -> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual time=10568.198..22945.663 rows=22 loops=1) Hash Cond: (node.id = prop.node_id) -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1) Recheck Cond: ((store_id = 6) AND (type_qname_id = 240)) Rows Removed by Index Recheck: 54239131 Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 2816 Heap Blocks: exact=24301 lossy=1875383 -> Bitmap Index Scan on idx_alf_node_mdq (cost=0.00..646144.01 rows=20047144 width=0) (actual time=3232.067..3232.067 rows=44246360 loops=1) Index Cond: ((store_id = 6) AND (type_qname_id = 240)) SubPlan 1 -> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2503.51..347403.58 rows=128379 width=8) (actual time=25.447..65.392 rows=5635 loops=1) Recheck Cond: (qname_id = 251) Heap Blocks: exact=40765 -> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2471.41 rows=128379 width=0) (actual time=18.835..18.835 rows=239610 loops=1) Index Cond: (qname_id = 251) -> Hash (cost=3526.11..3526.11 rows=871 width=8) (actual time=0.045..0.045 rows=23 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3526.11 rows=871 width=8) (actual time=0.021..0.042 rows=23 loops=1) Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text)) Heap Fetches: 23 -> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..2.01 rows=15 width=8) (actual time=0.011..0.011 rows=1 loops=22) Index Cond: ((node_id = node.id) AND (qname_id = 245)) Heap Fetches: 22 Planning time: 0.639 ms Execution time: 22946.036 ms Instance 2: (Only by testers - 250 request a day) Nested Loop Semi Join (cost=6471.94..173560841.08 rows=2 width=8) (actual time=0.162..0.464 rows=17 loops=1) -> Nested Loop (cost=6471.37..173560684.36 rows=45 width=16) (actual time=0.154..0.387 rows=17 loops=1) -> HashAggregate (cost=3508.15..3516.80 rows=865 width=8) (actual time=0.041..0.047 rows=18 loops=1) Group Key: prop.node_id -> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3505.99 rows=866 width=8) (actual time=0.020..0.035 r ows=18 loops=1) Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text)) Heap Fetches: 18 -> Index Scan using alf_node_pkey on alf_node node (cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1 loops=18) Index Cond: (id = prop.node_id) Filter: ((type_qname_id <> 145) AND (store_id = 6) AND (type_qname_id = 240) AND (NOT (SubPlan 1))) Rows Removed by Filter: 0 SubPlan 1 -> Materialize (cost=2962.65..397912.89 rows=158204 width=8) (actual time=0.001..0.009 rows=85 loops=17) -> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2962.65..396503.87 rows=158204 width=8) (actual time=0.021..0.082 rows= 85 loops=1) Recheck Cond: (qname_id = 251) Heap Blocks: exact=55 -> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2923.10 rows=158204 width=0) (actual time=0.015..0.015 rows=87 loops= 1) Index Cond: (qname_id = 251) -> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..34.32 rows=12 width=8) (actual time=0.004..0.004 rows=1 loop s=17) Index Cond: ((node_id = node.id) AND (qname_id = 245)) Heap Fetches: 17 Planning time: 0.623 ms Execution time: 0.540 ms Configurations are same in both servers. Please advise me on this. Is there any configuration specifically I need to look like “work_mem”, “Shared_buffers”, “checkpoint_segment”, “effective_cache_size”, “enable_seqscan” and “checkpoint_compression_target”? Thanks in advance. Fahiz On 9 Dec 2019, 19:03 +0000, Michael Lewis <mle...@entrata.com>, wrote: > > > There is a specific search query I am running to get list of Documents > > > and their metadata from several table in the DB. > > > We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance) > > > > > > Our current DB consists of 500GB of data and indexes. Most of the rows in > > > table are consist of 454,078,915 > > > > > > With the fresh DB with the restore of the DATA without any indexes Search > > > query performs relatively quick and most of the time its less than a > > > second. > > > > > > But after 3 weeks of use of the DB it sudenly started to slowdown only > > > for this perticular query and it takes 20+ seconds to respond. If I do a > > > restore the DB again then it continues to work fine and the symptom pops > > > out after 3 weeks time. > > > > > > You haven't been quite clear on the situation and your use case, but > > assuming this table has 454 million rows and experiences updates/deletes > > then this sounds like you may be having problems with autovacuum. Have you > > customized parameters to ensure it is running more frequently than default? > > How are you doing those data restores? Perhaps that process is cleaning up > > the accumulated bloat and you can run fine again for a while. Check > > pg_stat_user_tables for the last (auto)vacuum that ran, assuming you didn't > > just restore again and are expecting the issue to occur again soon.