Hi David,

Thanks for the reply.This has optimized number of rows.

Can you please explain, why it is getting more columns in output, even
though we have asked for only one column ?


 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT pa.process_activity_id
AS pa_process_activity_id  FROM process_activity pa WHERE pa.app_id =
'126502930200650' AND pa.created > '1970-01-01 00:00:00'  AND EXISTS (
SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id  AND
pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created
limit 50;



   QUERY PLAN



----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
 Limit  (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629
rows=50 loops=1)
   Output: pa.process_activity_id, pa.process_instance_id, pa.created
   Buffers: shared hit=274950
   ->  Nested Loop Semi Join  (cost=1.14..266660108.78 rows=367790473
width=24) (actual time=821.282..891.607 rows=50 loops=1)
         Output: pa.process_activity_id, pa.process_instance_id, pa.created
         Buffers: shared hit=274950
         ->  Index Scan using
process_activity_process_instance_id_app_id_created_idx on
public.process_activity pa  (cost=0.70..262062725.21 rows=367790473
width=32) (actual time=821.253..891.517 rows=50 loops=1)


* Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url,
pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias,
pa.read_by_user, pa.source, pa.label_category_id, pa.label_id,
pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated,
pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
pa.status_fragment, pa.internal_meta, pa.interaction_id,
pa.do_not_translate, pa.should_translate, pa.in_reply_to*
               Index Cond: ((m.app_id = '126502930200650'::bigint) AND
(m.created > '1970-01-01 00:00:00'::timestamp without time zone))
               Buffers: shared hit=274946
         ->  Materialize  (cost=0.43..2.66 rows=1 width=8) (actual
time=0.001..0.001 rows=1 loops=50)
               Output: pi.app_id
               Buffers: shared hit=4
               ->  Index Scan using fki_conv_konotor_user_user_id on
public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual
time=0.020..0.020 rows=1 loops=1)
                     Output: pi.app_id
                     Index Cond: (pi.user_id = '137074931866340'::bigint)
                     Filter: (pi.app_id = '126502930200650'::bigint)
                     Buffers: shared hit=4
 Planning time: 0.297 ms
 Execution time: 891.686 ms
(20 rows)

On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johns...@gmail.com>
wrote:

> On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.kla...@aklaver.com>
> wrote:
>
>> On 5/7/20 4:19 AM, Amarendra Konda wrote:
>> > Hi,
>> >
>> > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
>> > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>> >
>> > We have noticed huge difference interms of execution plan ( response
>> > time) , When we pass the direct values  Vs  inner query to IN clause.
>> >
>> > High level details of the use case are as follows
>> >
>> >   * As part of the SQL there are 2 tables named Process_instance
>> >     (master) and Process_activity ( child)
>> >   * Wanted to fetch TOP 50 rows from  Process_activity table for the
>> >     given values of the Process_instance.
>> >   * When we used Inner Join / Inner query ( query1)  between parent
>> >     table and child table , LIMIT is not really taking in to account.
>> >     Instead it is fetching more rows and columns that required, and
>> >     finally limiting the result
>>
>> It is doing what you told it to do which is SELECT all
>> process_instance_i's for user_id='317079413683604' and app_id =
>> '427380312000560' and then filtering further. I am going to guess that
>> if you run the inner query alone you will find it returns ~23496 rows.
>> You might have better results if you an actual join between
>> process_activity and process_instance. Something like below(obviously
>> not tested):
>>
>
> What the OP seems to want is a semi-join:
>
> (not tested)
>
> SELECT pa.process_activity_id
> FROM process_activity pa WHERE pa.app_id = '427380312000560' AND
> pa.created > '1970-01-01 00:00:00'
> AND EXISTS (
>   SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND
> pi.user_id = '317079413683604'
> )
> ORDER BY
> pa.process_instance_id,
> pa.created limit 50;
>
> I'm unsure exactly how this will impact the plan choice but it should be
> an improvement, and in any case more correctly defines what it is you are
> looking for.
>
> David J.
>
>

Reply via email to