Re: [PERFORM] UNION versus SUB SELECT

Robert DiFalco Fri, 22 Nov 2013 09:37:58 -0800

So far that one was the worst in terms of cost and time. Here are all the
plans with buffers, more records, and results being returned. At this point
I have good enough performance with my UNION approach but I'm just trying
to learn now. WHY is the union approach the fastest? I would have expected
the EXISTS or IN approaches to be faster or at least have the SAME cost? At
this point I just want to understand.


dft1fjfv106r48=> explain (analyze, buffers, verbose)
select *
from contacts c
where EXISTS(
   (select 1 from phone_numbers p where c.id = p.contact_id and p.national
= 5038904993 and p.e164 = '+15038904993')
   union
   (select 1 id = e.contact_id and e.email = 'robert.x...@gmail.com'));

    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..23238.90 rows=1425 width=95)
(actual time=2.241..46.817 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: (SubPlan 1)
   Rows Removed by Filter: 2843
   Buffers: shared hit=11497
   SubPlan 1
     ->  Unique  (cost=8.13..8.13 rows=2 width=0) (actual time=0.015..0.015
rows=0 loops=2850)
           Output: (1)
           Buffers: shared hit=11440
           ->  Sort  (cost=8.13..8.13 rows=2 width=0) (actual
time=0.013..0.013 rows=0 loops=2850)
                 Output: (1)
                 Sort Key: (1)
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=11440
                 ->  Append  (cost=0.06..8.13 rows=2 width=0) (actual
time=0.009..0.009 rows=0 loops=2850)
                       Buffers: shared hit=11440
                       ->  Index Only Scan using idx_phone on
public.phone_numbers p  (cost=0.06..4.06 rows=1 width=0) (actual
time=0.003..0.003 rows=0 loops=2850)
                             Output: 1
                             Index Cond: ((p.contact_id = c.id) AND
(p."national" = 5038904993::bigint) AND (p.e164 = '+15038904993'::text))
                             Heap Fetches: 11
                             Buffers: shared hit=5721
                       ->  Index Only Scan using idx_email_full on
public.email_addresses e  (cost=0.06..4.06 rows=1 width=0) (actual
time=0.003..0.003 rows=0 loops=2850)
                             Output: 1
                             Index Cond: ((e.contact_id = c.id) AND
(e.email = 'robert.x...@gmail.com'::text))
                             Heap Fetches: 5
                             Buffers: shared hit=5719
 Total runtime: 46.897 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose)
select *
from contacts where id IN (
  (select c.id
  from contacts c join phone_numbers p on c.id = p.contact_id and
p.national = 5038904993 and p.e164 = '+15038904993')
  union
  (select c.id from contacts c join email_addresses e on c.id =
e.contact_id and e.email = 'robert.x...@gmail.com'));

                                  QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=29.38..53.74 rows=6 width=95) (actual time=0.356..0.418
rows=7 loops=1)
   Output: contacts.id, contacts.owner_id, contacts.user_id,
contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype,
contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname,
contacts.fb_id
   Buffers: shared hit=87
   ->  HashAggregate  (cost=29.32..29.34 rows=6 width=8) (actual
time=0.347..0.354 rows=7 loops=1)
         Output: c.id
         Buffers: shared hit=66
         ->  Append  (cost=0.11..29.32 rows=6 width=8) (actual
time=0.047..0.316 rows=16 loops=1)
               Buffers: shared hit=66
               ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8) (actual
time=0.045..0.169 rows=11 loops=1)
                     Output: c.id
                     Buffers: shared hit=43
                     ->  Index Scan using idx_phone_address on
public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual
time=0.027..0.047 rows=11 loops=1)
                           Output: p.id, p.contact_id, p."national",
p.e164, p.raw_number
                           Index Cond: ((p."national" = 5038904993::bigint)
AND ((p.e164)::text = '+15038904993'::text))
                           Buffers: shared hit=9
                     ->  Index Only Scan using idx_contacts_pkey_owner on
public.contacts c  (cost=0.06..4.06 rows=1 width=8) (actual
time=0.005..0.006 rows=1 loops=11)
                           Output: c.id, c.owner_id, c.user_id
                           Index Cond: (c.id = p.contact_id)
                           Heap Fetches: 11
                           Buffers: shared hit=34
               ->  Nested Loop  (cost=2.12..21.17 rows=5 width=8) (actual
time=0.057..0.114 rows=5 loops=1)
                     Output: c_1.id
                     Buffers: shared hit=23
                     ->  Bitmap Heap Scan on public.email_addresses e
 (cost=2.06..8.85 rows=5 width=8) (actual time=0.044..0.055 rows=5 loops=1)
                           Output: e.id, e.contact_id, e.email
                           Recheck Cond: ((e.email)::text = '
robert.x...@gmail.com'::text)
                           Buffers: shared hit=7
                           ->  Bitmap Index Scan on idx_email_address
 (cost=0.00..2.06 rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
                                 Index Cond: ((e.email)::text = '
robert.x...@gmail.com'::text)
                                 Buffers: shared hit=2
                     ->  Index Only Scan using idx_contacts_pkey_owner on
public.contacts c_1  (cost=0.06..2.46 rows=1 width=8) (actual
time=0.005..0.006 rows=1 loops=5)
                           Output: c_1.id, c_1.owner_id, c_1.user_id
                           Index Cond: (c_1.id = e.contact_id)
                           Heap Fetches: 5
                           Buffers: shared hit=16
   ->  Index Scan using idx_contacts_pkey_owner on public.contacts
 (cost=0.06..4.06 rows=1 width=95) (actual time=0.003..0.004 rows=1 loops=7)
         Output: contacts.id, contacts.owner_id, contacts.user_id,
contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype,
contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname,
contacts.fb_id
         Index Cond: (contacts.id = c.id)
         Buffers: shared hit=21
 Total runtime: 0.535 ms
(40 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose)
select c.*
from contacts c
where EXISTS(select 1 from phone_numbers p where (p.national = 5038904993
and p.e164 = '+15038904993') and p.contact_id = c.id)
   or EXISTS(select 1 from email_addresses e where e.email = '
robert.x...@gmail.com' and e.contact_id = c.id);
                                                                     QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..23213.25 rows=2138 width=95)
(actual time=0.209..1.290 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives:
SubPlan 3 or hashed SubPlan 4))
   Rows Removed by Filter: 2843
   Buffers: shared hit=73
   SubPlan 1
     ->  Index Only Scan using idx_phone on public.phone_numbers p
 (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((p.contact_id = c.id) AND (p."national" =
5038904993::bigint) AND (p.e164 = '+15038904993'::text))
           Heap Fetches: 0
   SubPlan 2
     ->  Index Scan using idx_phone_address on public.phone_numbers p_1
 (cost=0.06..4.06 rows=1 width=8) (actual time=0.033..0.056 rows=11 loops=1)
           Output: p_1.contact_id
           Index Cond: ((p_1."national" = 5038904994::bigint) AND
((p_1.e164)::text = '+15038904993'::text))
           Buffers: shared hit=9
   SubPlan 3
     ->  Index Only Scan using idx_email_full on public.email_addresses e
 (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((e.contact_id = c.id) AND (e.email = '
robert.x...@gmail.com'::text))
           Heap Fetches: 0
   SubPlan 4
     ->  Bitmap Heap Scan on public.email_addresses e_1  (cost=2.06..8.85
rows=5 width=8) (actual time=0.040..0.050 rows=5 loops=1)
           Output: e_1.contact_id
           Recheck Cond: ((e_1.email)::text = 'robert.x...@gmail.com'::text)
           Buffers: shared hit=7
           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06
rows=5 width=0) (actual time=0.030..0.030 rows=6 loops=1)
                 Index Cond: ((e_1.email)::text = 'robert.x...@gmail.com
'::text)
                 Buffers: shared hit=2
 Total runtime: 1.395 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose)
select c.*
from contacts c
where c.id IN (
    select p.contact_id from phone_numbers p where (p.national = 5038904993
and p.e164 = '+15038904993'))
or c.id IN (
    select e.contact_id from email_addresses e where e.email = '
robert.x...@gmail.com');
                                                                    QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=12.92..81.32 rows=2138 width=95)
(actual time=0.208..1.283 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 2843
   Buffers: shared hit=73
   SubPlan 1
     ->  Index Scan using idx_phone_address on public.phone_numbers p
 (cost=0.06..4.06 rows=1 width=8) (actual time=0.032..0.054 rows=11 loops=1)
           Output: p.contact_id
           Index Cond: ((p."national" = 5038904993::bigint) AND
((p.e164)::text = '+15038904993'::text))
           Buffers: shared hit=9
   SubPlan 2
     ->  Bitmap Heap Scan on public.email_addresses e  (cost=2.06..8.85
rows=5 width=8) (actual time=0.040..0.049 rows=5 loops=1)
           Output: e.contact_id
           Recheck Cond: ((e.email)::text = 'robert.x...@gmail.com'::text)
           Buffers: shared hit=7
           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06
rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
                 Index Cond: ((e.email)::text = 'robert.x...@gmail.com
'::text)
                 Buffers: shared hit=2
 Total runtime: 1.371 ms
(19 rows)

dft1fjfv106r48=>


On Fri, Nov 22, 2013 at 7:54 AM, Merlin Moncure <mmonc...@gmail.com> wrote:

> On Thu, Nov 21, 2013 at 2:31 PM, desmodemone <desmodem...@gmail.com>
> wrote:
> > Hi Robert, could you try with "exists" ?
> >
> > SELECT c.*
> > FROM contacts c
> > WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and
> > p.contact_id=c.id )
> > OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=
> c.id );
>
> hm, how about:
> SELECT c.*
> FROM contacts c
> WHERE  exists  (
>   SELECT  1
>   FROM phone p
>   WHERE p.addr =? AND p.contact_id=c.id
>   UNION ALL
>   SELECT  1 FROM email e
>   WHERE e.addr = ? AND e.contact_id=c.id
> );
>
> merlin
>

Re: [PERFORM] UNION versus SUB SELECT

Reply via email to