Hi Adrian, Thanks for the reply. And i have kept latest execution plans, for various SQL statements ( inner join, sub queries and placing values instead of sub query) . As suggested, tried with INNER JOIN, however result was similar to subquery.
Is there any way we can tell the optimiser to process less number of rows based on the LIMIT value ? ( i.e. may be SQL re-write) ? *INNER SQL* EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) *(actual time=0.018..0.019 rows=2 loops=1)* Output: process_instance_id Index Cond: (pi.user_id = '137074931866340'::bigint) Filter: (pi.app_id = '126502930200650'::bigint) Buffers: shared hit=5 Planning time: 0.119 ms Execution time: 0.041 ms *Full query - Sub query* EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650') ORDER BY pa.process_instance_id, pa.created limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------- Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.386..744.415 rows=50 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Buffers: shared hit=3760 read=39316 -> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.384..744.396 rows=50 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Sort Key: pa.process_instance_id, pa.created Sort Method: top-N heapsort Memory: 28kB Buffers: shared hit=3760 read=39316 -> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.044..727.297 rows=47011 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Buffers: shared hit=3754 read=39316 -> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) *(actual time=0.009..0.015 rows=2 loops=1)* Output: pi.process_instance_id Index Cond: (pi.user_id = '137074931866340'::bigint) Filter: (pi.app_id = '126502930200650'::bigint) Buffers: shared hit=5 -> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) *(actual time=0.029..349.000 rows=23506 loops=2)* Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr anslate, pa.in_reply_to Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone)) Buffers: shared hit=3749 read=39316 Planning time: 2.547 ms Execution time: 744.499 ms (22 rows) *Full query - INNER JOIN* EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa INNER JOIN process_instance pi ON pi.process_instance_id = pa.process_instance_id AND pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pi.user_id = '137074931866340' AND pi.app_id = '126502930200650' ORDER BY pa.process_instance_id, pa.created limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------- Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.834 rows=50 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Buffers: shared hit=43070 -> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.815 rows=50 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Sort Key: pa.process_instance_id, pa.created Sort Method: top-N heapsort Memory: 28kB Buffers: shared hit=43070 -> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.030..73.847 rows=47011 loops=1) Output: pa.process_activity_id, pa.process_instance_id, pa.created Buffers: shared hit=43070 -> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) *(actual time=0.015..0.018 rows=2 loops=1)* Output: pi.process_instance_id Index Cond: (pi.user_id = '137074931866340'::bigint) Filter: (pi.app_id = '126502930200650'::bigint) Buffers: shared hit=5 -> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) *(actual time=0.011..21.447 rows=23506 loops=2)* Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr anslate, pa.in_reply_to Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone)) Buffers: shared hit=43065 Planning time: 0.428 ms Execution time: 87.905 ms *FULL Query - INNER SQL replaced with result* EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (*137074941043913,164357609323111*) ORDER BY pa.process_instance_id,pa.created limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------- Limit (cost=0.70..37.65 rows=50 width=24) (actual time=0.016..0.095 rows=50 loops=1) Output: process_activity_id, process_instance_id, created Buffers: shared hit=55 -> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..2100.39 rows=2841 width=24) *(actual time=0.015..0.077 rows=50 loops=1)* Output: process_activity_id, process_instance_id, created Index Cond: ((pa.process_instance_id = ANY ('{137074941043913,164357609323111}'::bigint[])) AND (pa.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone)) Buffers: shared hit=55 Planning time: 1.710 ms Execution time: 0.147 ms Regards, Amar On Thu, May 7, 2020 at 8:10 PM Adrian Klaver <adrian.kla...@aklaver.com> wrote: > On 5/7/20 4:19 AM, Amarendra Konda wrote: > > Hi, > > > > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled > > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit > > > > We have noticed huge difference interms of execution plan ( response > > time) , When we pass the direct values Vs inner query to IN clause. > > > > High level details of the use case are as follows > > > > * As part of the SQL there are 2 tables named Process_instance > > (master) and Process_activity ( child) > > * Wanted to fetch TOP 50 rows from Process_activity table for the > > given values of the Process_instance. > > * When we used Inner Join / Inner query ( query1) between parent > > table and child table , LIMIT is not really taking in to account. > > Instead it is fetching more rows and columns that required, and > > finally limiting the result > > It is doing what you told it to do which is SELECT all > process_instance_i's for user_id='317079413683604' and app_id = > '427380312000560' and then filtering further. I am going to guess that > if you run the inner query alone you will find it returns ~23496 rows. > You might have better results if you an actual join between > process_activity and process_instance. Something like below(obviously > not tested): > > SELECT > pa.process_activity_id > FROM > process_activity pa > JOIN > process_instance pi > ON > pa.process_instance_id = pi.process_instance_id > WHERE > pa.app_id = '427380312000560' > AND > pa.created > '1970-01-01 00:00:00' > AND > pi.user_id = '317079413683604' > ORDER BY > pa.process_instance_id, > pa.created > LIMIT 50; > > The second query is not equivalent as you are not filtering on user_id > and you are filtering on only three process_instance_id's. > > > > * > > > > > > *Query1* > > > > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > > pa.process_activity_id FROM process_activity pa WHERE pa.app_id = > > '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND > > pa.process_instance_id in *_(SELECT pi.process_instance_id FROM > > process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id = > > '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50; > > > > > > QUERY > PLAN > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Limit (cost=1071.47..1071.55 rows=31 width=24) (actual > > time=85.958..85.991 rows=50 loops=1) > > Output: pa.process_activity_id, pa.process_instance_id, pa.created > > Buffers: shared hit=43065 > > -> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual > > time=85.956..85.971 rows=50 loops=1) > > Output: pa.process_activity_id, pa.process_instance_id, > pa.created > > Sort Key: pa.process_instance_id, pa.created > > Sort Method: top-N heapsort Memory: 28kB > > Buffers: shared hit=43065 > > -> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual > > time=0.031..72.183 rows=46992 loops=1) > > Output: pa.process_activity_id, pa.process_instance_id, > > pa.created > > Buffers: shared hit=43065 > > -> Index Scan using fki_conv_konotor_user_user_id on > > public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual > > time=0.010..0.013 rows=2 loops=1) > > Output: pi.process_instance_id > > Index Cond: (pi.user_id = > '317079413683604'::bigint) > > Filter: (pi.app_id = '427380312000560'::bigint) > > Buffers: shared hit=5 > > -> Index Scan using > > process_activity_process_instance_id_app_id_created_idx on > > public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24) > > (actual time=0.015..20.702 rows=*23496* loops=2) > > * Output: pa.process_activity_id, pa.process_activity_type, > > pa.voice_url, pa.process_activity_user_id, pa.app_id, > > pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, > > pa.label_category_id, pa.label_id, pa.csat_response_id, > > pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, > pa.market > > ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, > > pa.internal_meta, pa.interaction_id, pa.do_not_translate, > > pa.should_translate, pa.in_reply_to* > > Index Cond: ((pa.process_instance_id = > > pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND > > (pa.created > '1970-01-01 00:00:00'::timestamp without time zone)) > > Buffers: shared hit=43060 > > Planning time: 0.499 ms > > Execution time: 86.040 ms > > (22 rows) > > > > *_Query 2_* > > > > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > > pa.process_activity_id AS m_process_activity_id FROM process_activity m > > WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 > > 00:00:00' AND pa.process_instance_id in > > (*240117466018927,325820556706970,433008275197305*) ORDER BY > > pa.process_instance_id,pa.created limit 50; > > > > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 > > rows=50 loops=1) > > Output: process_activity_id, process_instance_id, created > > Buffers: shared hit=50 > > -> Index Scan using > > process_activity_process_instance_id_app_id_created_idx on > > public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24) > > (actual time=0.022..0.079 *rows=50* loops=1) > > Output: process_activity_id, process_instance_id, created > > Index Cond: ((pa.process_instance_id = ANY > > ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND > > (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 > > 00:00:00'::timestamp without time zone)) > > Buffers: shared hit=50 > > Planning time: 0.167 ms > > Execution time: 0.137 ms > > (9 rows) > > > > > > Can someone explain > > > > * Why It is fetching more columns and more rows, incase of inner query > ? > > * Is there any option to really limit values with INNER JOIN, INNER > > query ? If yes, can you please share information on this ? > > > > Thanks in advance for your time and suggestions. > > > > Regards, Amar > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >