Adam Gundy wrote:
I'm hitting an unexpected problem with postgres 8.3 - I have some
tables which use varchar(32) for their unique IDs which I'm attempting
to join using some simple SQL:
select *
from group_access, groups
where group_access.groupid = groups.groupid and
group_access.uid = '7275359408f44591d0717e16890ce335';
there's a unique index on group_access.groupid, and a non-unique index
on groups.groupid. both are non-null.
What about group_access.uid - I'd have thought that + groups pkey is
probably the sensible combination here.
the problem is: if groupid (in both tables) is varchar, I cannot force
postgres (no matter how hard I try) to do an index scan. it ends up
reading the entire groups table (pretty large!):
OK
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=8.89..41329.88 rows=119940 width=287) (actual
time=0.202..935.136 rows=981 loops=1)
That's because it's expecting 119,940 rows to match (rather than the
actual 981 you do get). If you were getting that many results this is
probably a sensible plan.
Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text)
-> Seq Scan on groups (cost=0.00..31696.48 rows=1123348
width=177) (actual time=0.011..446.091 rows=1125239 loops=1)
It's got a good idea of the total number of rows in groups.
-> Hash (cost=8.51..8.51 rows=30 width=110) (actual
time=0.148..0.148 rows=30 loops=1)
-> Seq Scan on group_access (cost=0.00..8.51 rows=30
width=110) (actual time=0.014..0.126 rows=30 loops=1)
And also group_access. Oh, the seq-scan doesn't really matter here. It
probably *is* faster to read all 30 rows in one burst rather than go to
the index and then back to the table.
Filter: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
Total runtime: 935.443 ms
(7 rows)
if I disable seq_scan, I get this:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1.47..106189.61 rows=120004 width=287) (actual
time=0.100..1532.353 rows=981 loops=1)
It's still thinking it's going to get 120 thousand rows.
it's running an index scan across the entire table (no condition applied) :-(
so, just for the hell of it, I tried making groupid a char(32),
despite repeated assertions in this group that there's no performance
difference between the two:
There's no performance difference between the two.
Nested Loop (cost=4.48..253.85 rows=304 width=291) (actual
time=0.715..22.906 rows=984 loops=1)
(this last plan is actually against a smaller test DB, but I get the
same behavior with it, seq scan for varchar or index scan for char,
and the results returned are identical for this query)
The char(32) thing isn't important here, what is important is that it's
expecting ~300 rows rather than 120,000. It's still wrong, but it's
close enough to make sense.
So - the question is - why is PG expecting so many matches to your join.
How many distinct values do you have in groups.groupid and
group_access.group_id?
--
Richard Huxton
Archonet Ltd
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance