Hello,
I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential
scan on a multi-million row table. I _thought_ I had all the appropriate
indices, but apparently I do not. I was wondering if anyone can spot a way I
can speed up this query.
The query currently takes... *gulp*: 381119.201 ms :(
There are only 2 tables in the game: user_url and user_url_tag. The latter has
FKs pointing to the former. The sequential scan happens on the latter -
user_url_tag:
EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_
from user_url_tag userurltag0_, user_url userurl1_ WHERE (((userurl1_.user_id=1
)) AND ((userurltag0_.user_url_id=userurl1_.id ))) GROUP BY userurltag0_.tag
ORDER BY count(*) DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=140972.22..140996.28 rows=3207 width=10) (actual
time=381082.868..381110.094 rows=2546 loops=1)
-> Sort (cost=140972.22..140980.24 rows=3207 width=10) (actual
time=381082.858..381091.733 rows=2546 loops=1)
Sort Key: count(*), userurltag0_.tag
-> HashAggregate (cost=140777.45..140785.46 rows=3207 width=10)
(actual time=381032.844..381064.068 rows=2546 loops=1)
-> Hash Join (cost=2797.65..140758.50 rows=3790 width=10)
(actual time=248.530..380635.132 rows=8544 loops=1)
Hash Cond: ("outer".user_url_id = "inner".id)
-> Seq Scan on user_url_tag userurltag0_
(cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630
rows=6259553 loops=1)
-> Hash (cost=2795.24..2795.24 rows=962 width=4) (actual
time=199.840..199.840 rows=0 loops=1)
-> Index Scan using ix_user_url_user_id_url_id on
user_url userurl1_ (cost=0.00..2795.24 rows=962 width=4) (actual
time=0.048..193.707 rows=1666 loops=1)
Index Cond: (user_id = 1)
Total runtime: 381119.201 ms
(11 rows)
This is what the two tables look like (extra colums removed):
Table "public.user_url_tag"
Column | Type | Modifiers
-------------+-----------------------+--------------------------------------------------------------
id | integer | not null default
nextval('public.user_url_tag_id_seq'::text)
user_url_id | integer |
tag | character varying(64) |
Indexes:
"pk_user_url_tag_id" PRIMARY KEY, btree (id)
"ix_user_url_tag_tag" btree (tag)
"ix_user_url_tag_user_url_id" btree (user_url_id)
Foreign-key constraints:
"fk_user_url_tag_user_url_id" FOREIGN KEY (user_url_id) REFERENCES
user_url(id)
Table "public.user_url"
Column | Type |
Modifiers
------------------+-----------------------------+----------------------------------------------------------
id | integer | not null default
nextval('public.user_url_id_seq'::text)
user_id | integer |
url_id | integer |
Indexes:
"pk_user_url_id" PRIMARY KEY, btree (id)
"ix_user_url_url_id_user_id" UNIQUE, btree (url_id, user_id)
"ix_user_url_user_id_url_id" UNIQUE, btree (user_id, url_id)
Does anyone see a way to speed up this s-l-o-w query?
I cache DB results, but I'd love to get rid of that sequential scan.
Thanks,
Otis
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq