Re: [GENERAL] Slow join over three tables

Tim van der Linden Wed, 27 Apr 2016 23:36:46 -0700

On Wed, 27 Apr 2016 13:48:06 +0200
Alban Hertroys <haram...@gmail.com> wrote:


> In this case, you're using the values in adverse to filter relevant rid's for 
> the FK join, so you might be better off with the inverse of above index:
>       create index on report_adverses (adverse, rid);
>       create index on report_drugs (drug, rid);

Hmmmm, like I reported yesterday this achieved a huge performance boost.

However, two (new) things I like to touch on while further experimenting with 
this query:

1. Altering Drug IDs or Adverse names effects the executing time negatively.

In this example altering the drug IDs I search for makes the query 6 times 
slower again:

SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
FROM reports r
JOIN report_drugs d ON d.rid = r.id
JOIN report_adverses a ON a.rid = r.id 
WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back 
pain', 'back pain']) 
AND d.drug = ANY (ARRAY[9557, 17848, 17880, 18223]) ORDER BY r.created;

Different drug ID ([9557, 17848, 17880, 18223]), but the rest is the same. 
Query plan:

 Sort  (cost=31409.71..31417.48 rows=3107 width=41) (actual 
time=662.707..662.819 rows=4076 loops=1)
   Sort Key: r.created
   Sort Method: quicksort  Memory: 415kB
   ->  Nested Loop  (cost=25693.17..31229.48 rows=3107 width=41) (actual 
time=71.748..661.743 rows=4076 loops=1)
         ->  Merge Join  (cost=25692.61..26795.53 rows=3107 width=29) (actual 
time=70.841..97.094 rows=4076 loops=1)
               Merge Cond: (d.rid = a.rid)
               ->  Sort  (cost=16332.08..16671.61 rows=135810 width=8) (actual 
time=48.946..58.623 rows=135370 loops=1)
                     Sort Key: d.rid
                     Sort Method: quicksort  Memory: 12498kB
                     ->  Index Scan using report_drugs_drug_idx on report_drugs 
d  (cost=0.44..4753.44 rows=135810 width=8) (actual time=0.681..28.441 
rows=135532 loops=1)
                           Index Cond: (drug = ANY 
('{9557,17848,17880,18223}'::integer[]))
               ->  Sort  (cost=9360.53..9556.94 rows=78565 width=21) (actual 
time=21.880..25.969 rows=77163 loops=1)
                     Sort Key: a.rid
                     Sort Method: quicksort  Memory: 6682kB
                     ->  Index Only Scan using report_adverses_adverse_rid_idx 
on report_adverses a  (cost=0.56..2972.57 rows=78565 width=21) (actual 
time=0.983..10.744 rows=76972 loops=1)
                           Index Cond: (adverse = ANY ('{"back pain - 
nonspecific","nonspecific back pain","back pain"}'::text[]))
                           Heap Fetches: 0
         ->  Index Only Scan using reports_id_age_gender_created_idx on reports 
r  (cost=0.56..1.42 rows=1 width=20) (actual time=0.137..0.138 rows=1 
loops=4076)
               Index Cond: (id = d.rid)
               Heap Fetches: 0
 Planning time: 16.438 ms
 Execution time: 663.583 ms

A total of 660 ms instead of the previous 120 ms. The amount of rows returned 
are 4076, instead of the 448 rows of the previous query. Could this be the 
cause of the slow down? Or is it the fact that the integers in the ANY clause 
are further apart from one another and thus make for longer index searches?

The same happens with changing the adverses, some adverses come back in ~120 
ms, others can take up to one second.

2. Adding extra columns ignores indexes

This one was kind of expected. Adding extra columns from both "report_drugs" 
and "report_adverses" slows the lot down again. Extra columns added:

- a.recovery
- a.severity
- d.reason
- d.effectiveness
- d.duration

As I would expect, adding these columns would make the previous multi-column 
indexes useless. And they indeed were not used anymore.

So my initial thought was to create new indexes on the columns I now query:

- CREATE INDEX ON report_adverses(adverse, rid, severity, recovery);
- CREATE INDEX ON report_drugs(drug, rid, reason, effectiveness, duration);

After running the query again, the new index on "report_adverses" got picked 
up, but the index on "report_drugs" did not:

 Sort  (cost=12365.79..12366.61 rows=329 width=76) (actual 
time=129.106..129.120 rows=448 loops=1)
   Sort Key: r.created
   Sort Method: quicksort  Memory: 66kB
   ->  Nested Loop  (cost=11212.93..12352.04 rows=329 width=76) (actual 
time=31.558..128.951 rows=448 loops=1)
         Join Filter: (d.rid = r.id)
         ->  Merge Join  (cost=11212.38..11680.44 rows=329 width=64) (actual 
time=30.705..39.527 rows=448 loops=1)
               Merge Cond: (d.rid = a.rid)
               ->  Sort  (cost=1503.85..1539.82 rows=14390 width=35) (actual 
time=6.977..7.993 rows=14074 loops=1)
                     Sort Key: d.rid
                     Sort Method: quicksort  Memory: 1202kB
                     ->  Index Scan using report_drugs_drug_idx on report_drugs 
d  (cost=0.44..510.02 rows=14390 width=35) (actual time=0.567..4.638 rows=14200 
loops=1)
                           Index Cond: (drug = ANY 
('{359,360,361,362,363}'::integer[]))
               ->  Sort  (cost=9708.53..9904.94 rows=78565 width=29) (actual 
time=23.717..26.540 rows=76974 loops=1)
                     Sort Key: a.rid
                     Sort Method: quicksort  Memory: 6702kB
                     ->  Index Only Scan using 
report_adverses_adverse_rid_severity_recovery_idx on report_adverses a  
(cost=0.56..3320.57 rows=78565 width=29) (actual time=0.878..12.297 rows=76972 
loops=1)
                           Index Cond: (adverse = ANY ('{"back pain - 
nonspecific","nonspecific back pain","back pain"}'::text[]))
                           Heap Fetches: 0
         ->  Index Only Scan using reports_id_age_gender_created_idx on reports 
r  (cost=0.56..2.03 rows=1 width=20) (actual time=0.198..0.199 rows=1 loops=448)
               Index Cond: (id = a.rid)
               Heap Fetches: 0
 Planning time: 18.310 ms
 Execution time: 129.483 ms

Still fast enough ... but I was wondering why the planner would not use the new 
index and instead fall back on the "report_drugs_drug_idx" single column index.


> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.

Cheers,
Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Slow join over three tables

Reply via email to