[PERFORM] index usage

2004-04-26 Thread brad-pgperf
Hi, 

I have a query which I think should be using an index all of the time but 
postgres only uses the index part of the time.  The index 
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed 
first followed by the selected column (support_person_id).  Wouldn't the 
most efficient plan be to scan the index each time because the only columns 
needed are in the index?  Below is the table, 2 queries showing the 
difference in plans, followed by the record distribution of ticket_crm_map.  
I first did a 'vacuum analyze' to update the statistics. 

Thanks,
Brad 

athenapost=> \d ticket_crm_map
 Table "public.ticket_crm_map"
   Column |Type | 
Modifiers
+-+- 
---
tcrm_map_id| integer | not null
ticket_id  | integer | not null
crm_id | integer | not null
support_person_id  | integer | not null
escalated_to_person_id | integer | not null
status | character varying(50)   | not null default 
'Open'::character varying
close_date | timestamp without time zone |
updated_date   | timestamp without time zone |
updated_by | character varying(255)  |
created_date   | timestamp without time zone |
created_by | character varying(255)  |
additional_info| text|
subject| character varying(255)  |
Indexes:
  "ticket_crm_map_pkey" primary key, btree (tcrm_map_id)
  "ticket_crm_map_crm_id_key" unique, btree (crm_id, ticket_id)
  "ticket_crm_map_crm_id_suppid" btree (crm_id, support_person_id)
  "ticket_crm_map_status" btree (status)
  "ticket_crm_map_ticket_id" btree (ticket_id)
Foreign-key constraints:
  "$1" FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id)
  "$2" FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id)
  "$3" FOREIGN KEY (support_person_id) REFERENCES person(person_id)
  "$4" FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id)
  "$5" FOREIGN KEY (status) REFERENCES ticket_status(status) 

athenapost=> explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 7;
  
QUERY PLAN
 
 
--
Unique  (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245 
rows=20 loops=1)
 ->  Sort  (cost=1262.99..1264.13 rows=456 width=4) (actual 
time=15.332..16.605 rows=2275 loops=1)
   Sort Key: support_person_id
   ->  Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map  
(cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275 
loops=1)
 Index Cond: (crm_id = 7)
Total runtime: 18.553 ms
(6 rows) 

Time: 20.598 ms
athenapost=> explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 1;
 QUERY PLAN
 
-
Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual 
time=659.102..791.517 rows=24 loops=1)
 ->  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual 
time=659.090..713.285 rows=93889 loops=1)
   Sort Key: support_person_id
   ->  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628 
width=4) (actual time=0.027..359.299 rows=93889 loops=1)
 Filter: (crm_id = 1)
Total runtime: 814.601 ms
(6 rows) 

Time: 817.095 ms
athenapost=> select count(*), crm_id from ticket_crm_map group by crm_id;
count | crm_id
---+
2554 | 63
 129 | 25
  17 | 24
 110 | 23
  74 | 22
  69 | 21
   2 | 20
  53 | 82
  10 | 17
  16 | 81
46637 | 16
  14 | 80
   2 | 15
1062 | 79
  87 | 78
  93 | 77
  60 | 44
 363 | 76
 225 | 10
   4 | 74
  83 |  9
  27 | 73
 182 |  8
2275 |  7
  15 | 71
 554 |  6
  44 | 70
 631 |  5
  37 |  4
 190 |  3
 112 |  2
93889 |  1
(32 rows) 

Time: 436.697 ms
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] planner/optimizer question

2004-04-26 Thread brad-pgperf
Hi, 

I have a query which I think should be using an index all of the time but 
postgres only uses the index part of the time.  The index 
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed 
first followed by the selected column (support_person_id).  Wouldn't the 
most efficient plan be to scan the index regardless of crm_id because the 
only columns needed are in the index?  Below is the table, 2 queries showing 
the difference in plans, followed by the record distribution of 
ticket_crm_map.  I first did a 'vacuum analyze' and am running postgres 
7.4.2. 

Thanks,
Brad 

athenapost=> \d ticket_crm_map
   Table "public.ticket_crm_map"
 Column |Type | 
Modifiers
+-+- 
---
tcrm_map_id| integer | not null
ticket_id  | integer | not null
crm_id | integer | not null
support_person_id  | integer | not null
escalated_to_person_id | integer | not null
status | character varying(50)   | not null default 
'Open'::character varying
close_date | timestamp without time zone |
updated_date   | timestamp without time zone |
updated_by | character varying(255)  |
created_date   | timestamp without time zone |
created_by | character varying(255)  |
additional_info| text|
subject| character varying(255)  |
Indexes:
"ticket_crm_map_pkey" primary key, btree (tcrm_map_id)
"ticket_crm_map_crm_id_key" unique, btree (crm_id, ticket_id)
"ticket_crm_map_crm_id_suppid" btree (crm_id, support_person_id)
"ticket_crm_map_status" btree (status)
"ticket_crm_map_ticket_id" btree (ticket_id)
Foreign-key constraints:
"$1" FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id)
"$2" FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id)
"$3" FOREIGN KEY (support_person_id) REFERENCES person(person_id)
"$4" FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id)
"$5" FOREIGN KEY (status) REFERENCES ticket_status(status) 

athenapost=> explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 7;

QUERY PLAN
 
 
--
Unique  (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245 
rows=20 loops=1)
->  Sort  (cost=1262.99..1264.13 rows=456 width=4) (actual 
time=15.332..16.605 rows=2275 loops=1)
 Sort Key: support_person_id
 ->  Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map  
(cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275 
loops=1)
   Index Cond: (crm_id = 7)
Total runtime: 18.553 ms
(6 rows) 

Time: 20.598 ms
athenapost=> explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 1;
   QUERY PLAN
 
-
Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual 
time=659.102..791.517 rows=24 loops=1)
->  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual 
time=659.090..713.285 rows=93889 loops=1)
 Sort Key: support_person_id
 ->  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628 
width=4) (actual time=0.027..359.299 rows=93889 loops=1)
   Filter: (crm_id = 1)
Total runtime: 814.601 ms
(6 rows) 

Time: 817.095 ms
athenapost=> select count(*), crm_id from ticket_crm_map group by crm_id;
count | crm_id
---+
2554 | 63
129 | 25
17 | 24
110 | 23
74 | 22
69 | 21
 2 | 20
53 | 82
10 | 17
16 | 81
46637 | 16
14 | 80
 2 | 15
1062 | 79
87 | 78
93 | 77
60 | 44
363 | 76
225 | 10
 4 | 74
83 |  9
27 | 73
182 |  8
2275 |  7
15 | 71
554 |  6
44 | 70
631 |  5
37 |  4
190 |  3
112 |  2
93889 |  1
(32 rows) 

Time: 436.697 ms 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] index usage

2004-04-26 Thread brad-pgperf
When checking an index in postgres the original table has to be checked for 
each result to find if the index entry is still valid?  In which case you 
can't blindly scan the whole index and assume the data is good. I was used 
to Oracle behavior where the index is up to date so it can do the scan 
without hitting the original table. 

Does this sound correct to anyone? 

Thanks,
Brad 

Stephan Szabo writes:
On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote: 

I have a query which I think should be using an index all of the time but
postgres only uses the index part of the time.  The index
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
first followed by the selected column (support_person_id).  Wouldn't the
most efficient plan be to scan the index each time because the only columns
needed are in the index?  Below is the table, 2 queries showing the
Not necessarily.  The rows in the actual file still need to be checked to
see if they're visible to the select and if it's expected that the entire
file (or a reasonable % of the pages anyway) will need to be loaded using
the index isn't necessarily a win. 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])