On Thu, 28 Apr 2016 16:38:53 +0200 Alban Hertroys <haram...@gmail.com> wrote:
Hi Alban First off ... thanks you for your time on this so far and giving me the educational smacks on the head :) I appreciate this a lot. > You're doing ~9 times as many index lookups. A slowdown of 6x of this > part of the query seems rather reasonable. Again, very true. But why 9 times as much? > > Planning time: 16.438 ms > > Execution time: 663.583 ms > > > > A total of 660 ms instead of the previous 120 ms. The amount of rows > > returned are 4076, instead of the 448 rows of the previous query. Could > > this be the cause of the slow down? Or is it the fact that the integers in > > the ANY clause are further apart from one another and thus make for longer > > index searches? > > Technically probably both, but the index node distance (assuming a > btree index) should be insignificant. It is indeed a b-tree index. > The sorting of found candidate rows on rid probably contributes the > most. I forgot how quicksort scales to the number of items to sort, > probably something like 2log(n) or something like that. Looked this up and you are right, a quicksort (in C) is of nlog(n) complexity ... and if I'm right this is "rather" linear. Less sorting is thus the message :) > That said, you didn't happen to perform these tests (assuming >1) > during a busier period of the database server? I ask because later on > you seem to be getting fast results again after some more index > changes and I don't see cause for the difference. No, these test are on a local development machine with nothing else going on but the database cluster. > > 2. Adding extra columns ignores indexes > > ... > > - a.recovery > > - a.severity > > - d.reason > > - d.effectiveness > > - d.duration > > I have to admit that I'm a bit behind with the current state of the > art of PostgreSQL, but last time I checked, the database needed to > look at each row in the result-set for transaction visibility > information. I recall that there was (at some point) much discussion > whether that visibility information could be added to indexes and that > there were strong arguments against doing so. Hence, I doubt that > those new index-only scans skip that step. I see. > Unless I'm wrong there, adding non-queried fields to the index is only > going to affect your performance adversely. Relevant fields for > indices are those used in joins, those regularly used in conditions > (where-clauses) and those that are regularly sorted over. > > Other fields are available from the candidate rows of the result set. > Putting them in the index just results in storing redundant > information. Storing redundant information and making for bloated indexes which the planner might choose to skip if I understand correctly. Good to know. > > ... > > Still fast enough ... but I was wondering why the planner would not use the > > new index and instead fall back on the "report_drugs_drug_idx" single > > column index. > > I'm actually a bit suspicious of those numbers; those are different > drug id's than those from the first explain too. You are completely right, this was the wrong plan ... the correct plan is: Sort (cost=31757.71..31765.48 rows=3107 width=76) (actual time=722.348..722.461 rows=4076 loops=1) Sort Key: r.created Sort Method: quicksort Memory: 495kB -> Nested Loop (cost=26041.17..31577.48 rows=3107 width=76) (actual time=139.407..721.090 rows=4076 loops=1) -> Merge Join (cost=26040.61..27143.53 rows=3107 width=64) (actual time=139.396..170.446 rows=4076 loops=1) Merge Cond: (d.rid = a.rid) -> Sort (cost=16332.08..16671.61 rows=135810 width=35) (actual time=108.866..119.143 rows=135370 loops=1) Sort Key: d.rid Sort Method: quicksort Memory: 13992kB -> Index Scan using report_drugs_drug_idx on report_drugs d (cost=0.44..4753.44 rows=135810 width=35) (actual time=0.038..53.758 rows=135532 loops=1) Index Cond: (drug = ANY ('{9557,17848,17880,18223}'::integer[])) -> Sort (cost=9708.53..9904.94 rows=78565 width=29) (actual time=30.517..34.876 rows=77163 loops=1) Sort Key: a.rid Sort Method: quicksort Memory: 6702kB -> Index Only Scan using report_adverses_adverse_rid_severity_recovery_idx on report_adverses a (cost=0.56..3320.57 rows=78565 width=29) (actual time=1.005..16.135 rows=76972 loops=1) Index Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back pain"}'::text[])) Heap Fetches: 0 -> Index Only Scan using reports_id_age_gender_created_idx on reports r (cost=0.56..1.42 rows=1 width=20) (actual time=0.134..0.135 rows=1 loops=4076) Index Cond: (id = d.rid) Heap Fetches: 0 Planning time: 29.415 ms Execution time: 723.545 ms And this is now indeed much closer to the ~660 ms from before, it doesn't make much of a difference after all. Cheers, Tim -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general