Hi David, Thanks for the reply.This has optimized number of rows.
Can you please explain, why it is getting more columns in output, even though we have asked for only one column ? EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------- Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Buffers: shared hit=274950 -> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Buffers: shared hit=274950 -> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1) * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translate, pa.in_reply_to* Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone)) Buffers: shared hit=274946 -> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50) Output: pi.app_id Buffers: shared hit=4 -> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1) Output: pi.app_id Index Cond: (pi.user_id = '137074931866340'::bigint) Filter: (pi.app_id = '126502930200650'::bigint) Buffers: shared hit=4 Planning time: 0.297 ms Execution time: 891.686 ms (20 rows) On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johns...@gmail.com> wrote: > On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.kla...@aklaver.com> > wrote: > >> On 5/7/20 4:19 AM, Amarendra Konda wrote: >> > Hi, >> > >> > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled >> > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit >> > >> > We have noticed huge difference interms of execution plan ( response >> > time) , When we pass the direct values Vs inner query to IN clause. >> > >> > High level details of the use case are as follows >> > >> > * As part of the SQL there are 2 tables named Process_instance >> > (master) and Process_activity ( child) >> > * Wanted to fetch TOP 50 rows from Process_activity table for the >> > given values of the Process_instance. >> > * When we used Inner Join / Inner query ( query1) between parent >> > table and child table , LIMIT is not really taking in to account. >> > Instead it is fetching more rows and columns that required, and >> > finally limiting the result >> >> It is doing what you told it to do which is SELECT all >> process_instance_i's for user_id='317079413683604' and app_id = >> '427380312000560' and then filtering further. I am going to guess that >> if you run the inner query alone you will find it returns ~23496 rows. >> You might have better results if you an actual join between >> process_activity and process_instance. Something like below(obviously >> not tested): >> > > What the OP seems to want is a semi-join: > > (not tested) > > SELECT pa.process_activity_id > FROM process_activity pa WHERE pa.app_id = '427380312000560' AND > pa.created > '1970-01-01 00:00:00' > AND EXISTS ( > SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND > pi.user_id = '317079413683604' > ) > ORDER BY > pa.process_instance_id, > pa.created limit 50; > > I'm unsure exactly how this will impact the plan choice but it should be > an improvement, and in any case more correctly defines what it is you are > looking for. > > David J. > >