On 10/3/14, 11:21 AM, Jonathan Vanasco wrote:
I've been able to fix most of my slow queries into something more acceptable,
but I haven't been able to shave any time off this one. I'm hoping someone has
another strategy.
I have 2 tables:
resource
resource_2_tag
I want to calculate the top 25 "tag_ids" in "resource_2_tag " for resources that match a
given attribute on the "resource" table.
both tables have around 1.6million records.
If the database needs to warm up and read into cache, this can take 60seconds
to read the data off disk.
If the database doesn't need to warm up, it averages 1.76seconds.
The 1.76s time is troubling me.
Searching for the discrete elements of this is pretty lightweight.
here's an explain -- http://explain.depesz.com/s/PndC
I tried a subquery instead of a join, and the query optimized the plan to the
same.
i'm hoping someone will see something that I just don't see.
Table "public.resource_2_tag"
Column | Type | Modifiers
-----------------------+---------+-----------
resource_id | integer |
tag_id | integer |
Indexes:
"_idx_speed_resource_2_tag__resource_id" btree (resource_id)
"_idx_speed_resource_2_tag__tag_id" btree (tag_id)
Table "public.resource"
Column | Type |
Modifiers
-------------------------------------+-----------------------------+----------------------------------------------------------
id | integer | not null
default nextval('resource_id_seq'::regclass)
resource_attribute1_id | integer |
lots of other columns | |
Indexes:
"resource_attribute1_idx" btree (resource_attribute1_id)
--------------------------------------------------------------------------------
select count(*) from resource;
-- 1669729
select count(*) from resource_2_tag;
-- 1676594
select count(*) from resource where resource_attribute1_id = 614;
-- 5184
-- 4.386ms
select id from resource where resource_attribute1_id = 614;
-- 5184
-- 87.303ms
popping the 5k elements into an "in" clause, will run the query in around 100ms.
EXPLAIN ANALYZE
SELECT
resource_2_tag.tag_id AS resource_2_tag_tag_id,
count(resource_2_tag.tag_id) AS counted
FROM
resource_2_tag
JOIN resource ON resource.id = resource_2_tag.resource_id
WHERE
resource.resource_attribute1_id = 614
GROUP BY resource_2_tag.tag_id
ORDER BY counted DESC
LIMIT 25 OFFSET 0;
Don't join to the resource table; there's no reason to because you're not
pulling anything from it.
If for some reason you do need data out of the resource table, do the LIMIT 25
first, in a sub-select:
SELECT r.*, counted
FROM resource r
JOIN (
SELECT tag_id, count(*)
FROM resource_2_tag
GROUP BY tag_id
ORDER BY tag_id
LIMIT 25
) t ON ...
;
--
--
Jim Nasby, Data Architect, Blue Treble
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general