Re: [PERFORM] UNION versus SUB SELECT

Robert DiFalco Thu, 21 Nov 2013 14:12:39 -0800

Hmmmm...I'm not sure why the buffers option didn't work for me, maybe the
heroku psql is out of date. No, the query gets slower with a high load of
data and runs pretty often.


I just created a small test dataset for this. When I have a larger one I
will post new explain plans but the timings seem pretty consistent
regardless of the results returns (usually only 2-200) even when there are
millions of records in "contacts", "phone_numbers", and "email_addresses".

In this case doesn't the correlated query have to do more work and access
more columns than the subselect approach?


On Thu, Nov 21, 2013 at 1:22 PM, Elliot <yields.falseh...@gmail.com> wrote:

>  On 2013-11-21 16:12, Robert DiFalco wrote:
>
> Sorry I couldn't get buffers to work but here is the explain analyze
> verbose:
>
>  dft1fjfv106r48=> explain analyze verbose select c.*
>
>                    from contacts c
>
>                                    where c.id IN (
>
>                                                          select
> p.contact_id
>
>          from phone_numbers p
>
>                         where (p.national = 5038904993 and p.e164 = '
> +15038904993'))
>                                             or c.id IN (
>
>                                                                   select
> e.contact_id
>
>          from email_addresses e
>
>                         where e.email = 'robert.xx...@gmail.com')
>
>                                      ;
>                                                                      QUERY
> PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on public.contacts c  (cost=8.12..75.73 rows=1988 width=95)
> (actual time=0.410..0.410 rows=0 loops=1)
>    Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
> c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
>    Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
>    Rows Removed by Filter: 2849
>    SubPlan 1
>      ->  Index Scan using idx_phone_address on public.phone_numbers p
>  (cost=0.06..4.06 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1)
>            Output: p.contact_id
>            Index Cond: ((p."national" = 5038904993::bigint) AND
> ((p.e164)::text = '+15038904993'::text))
>    SubPlan 2
>      ->  Index Scan using idx_email_address on public.email_addresses e
>  (cost=0.06..4.06 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
>            Output: e.contact_id
>            Index Cond: ((e.email)::text = 'robert.xx...@gmail.com'::text)
>  Total runtime: 0.489 ms
> (13 rows)
>
>  dft1fjfv106r48=> explain analyze verbose select c.*
>
>                    from contacts c
>
>                                    where exists(
>
>                                                        select 1
>
>                                                                       from
> phone_numbers p
>
>           where (p.national = 5038904993 and p.e164 = '+15038904993') and
> p.contact_id = c.id)
>                           or EXISTS(
>
>                                               select 1
>
>                                                               from
> email_addresses e
>
>           where e.email = 'robert.xx...@gmail.com' and e.contact_id = c.id)
>
>                      ;
>
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on public.contacts c  (cost=0.00..21596.38 rows=1988 width=95)
> (actual time=0.479..0.479 rows=0 loops=1)
>    Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
> c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
>    Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR
> (alternatives: SubPlan 3 or hashed SubPlan 4))
>    Rows Removed by Filter: 2849
>    SubPlan 1
>      ->  Index Scan using idx_phone_address on public.phone_numbers p
>  (cost=0.06..4.06 rows=1 width=0) (never executed)
>            Index Cond: ((p."national" = 5038904993::bigint) AND
> ((p.e164)::text = '+15038904993'::text))
>            Filter: (p.contact_id = c.id)
>    SubPlan 2
>      ->  Index Scan using idx_phone_address on public.phone_numbers p_1
>  (cost=0.06..4.06 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
>            Output: p_1.contact_id
>            Index Cond: ((p_1."national" = 5038904993::bigint) AND
> ((p_1.e164)::text = '+15038904993'::text))
>    SubPlan 3
>      ->  Index Scan using idx_email_address on public.email_addresses e
>  (cost=0.06..4.06 rows=1 width=0) (never executed)
>            Index Cond: ((e.email)::text = 'robert.xx...@gmail.com'::text)
>            Filter: (e.contact_id = c.id)
>    SubPlan 4
>      ->  Index Scan using idx_email_address on public.email_addresses e_1
>  (cost=0.06..4.06 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1)
>            Output: e_1.contact_id
>            Index Cond: ((e_1.email)::text = 'robert.xx...@gmail.com
> '::text)
>  Total runtime: 0.559 ms
>  (21 rows)
>
>  dft1fjfv106r48=> explain analyze verbose select * from contacts where id
> IN (
>                    (select c.id
>
>                                     from contacts c
>
>                                                    join phone_numbers p on
> c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993')
>                                                                   union
> (select c.id
>
>             from contacts c
>
>                            join email_addresses e on c.id = e.contact_id
> and e.email = 'robert.xx...@gmail.com'));
>
>
>      QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=16.31..24.39 rows=2 width=95) (actual
> time=0.060..0.060 rows=0 loops=1)
>    Output: contacts.id, contacts.owner_id, contacts.user_id,
> contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype,
> contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname,
> contacts.fb_id
>    ->  Unique  (cost=16.26..16.26 rows=2 width=8) (actual
> time=0.057..0.057 rows=0 loops=1)
>          Output: c.id
>          ->  Sort  (cost=16.26..16.26 rows=2 width=8) (actual
> time=0.055..0.055 rows=0 loops=1)
>                Output: c.id
>                Sort Key: c.id
>                Sort Method: quicksort  Memory: 25kB
>                ->  Append  (cost=0.11..16.25 rows=2 width=8) (actual
> time=0.034..0.034 rows=0 loops=1)
>                      ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8)
> (actual time=0.013..0.013 rows=0 loops=1)
>                            Output: c.id
>                            ->  Index Scan using idx_phone_address on
> public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual
> time=0.011..0.011 rows=0 loops=1)
>                                  Output: p.id, p.contact_id,
> p."national", p.e164, p.raw_number
>                                  Index Cond: ((p."national" =
> 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
>                            ->  Index Only Scan using
> idx_contacts_pkey_owner on public.contacts c  (cost=0.06..4.06 rows=1
> width=8) (never executed)
>                                  Output: c.id, c.owner_id, c.user_id
>                                  Index Cond: (c.id = p.contact_id)
>                                  Heap Fetches: 0
>                      ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8)
> (actual time=0.018..0.018 rows=0 loops=1)
>                            Output: c_1.id
>                            ->  Index Scan using idx_email_address on
> public.email_addresses e  (cost=0.06..4.06 rows=1 width=8) (actual
> time=0.016..0.016 rows=0 loops=1)
>                                  Output: e.id, e.contact_id, e.email
>                                  Index Cond: ((e.email)::text = '
> robert.xx...@gmail.com'::text)
>                            ->  Index Only Scan using
> idx_contacts_pkey_owner on public.contacts c_1  (cost=0.06..4.06 rows=1
> width=8) (never executed)
>                                  Output: c_1.id, c_1.owner_id, c_1.user_id
>                                  Index Cond: (c_1.id = e.contact_id)
>                                  Heap Fetches: 0
>    ->  Index Scan using idx_contacts_pkey_owner on public.contacts
>  (cost=0.06..4.06 rows=1 width=95) (never executed)
>          Output: contacts.id, contacts.owner_id, contacts.user_id,
> contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype,
> contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname,
> contacts.fb_id
>          Index Cond: (contacts.id = c.id)
>  Total runtime: 0.332 ms
> (31 rows)
>
>
>
>  The buffers option is 9.0+ and is used like "explain (analyze, verbose,
> buffers) select 1".
> To your original question, the union output there runs slightly faster
> than the "in" approach, although this may not be a good example - your
> inputs don't return any data, so this might not be realistic - and those
> numbers are so low that the difference might just be noise.
> Are you tuning a <0.5ms-runtime query? Or is this just curiosity? FWIW I
> tend to write queries like this using an exists check first, then if that's
> still not good enough (all things like proper indexing taken in to account)
> I'll try an in check, then finally a union if that's still not good enough.
>
>

Re: [PERFORM] UNION versus SUB SELECT

Reply via email to