"Tom Lane" <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> "Tom Lane" <[EMAIL PROTECTED]> writes: >>> Nested Loop (cost=5.39..198.81 rows=51 width=244) >>> -> HashAggregate (cost=1.06..1.11 rows=5 width=4) >>> -> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4) >>> -> Bitmap Heap Scan on tenk1 a (cost=4.33..39.41 rows=10 width=244) >>> Recheck Cond: (a.thousand = b.f1) >>> -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..4.33 rows=10 >>> width=0) >>> Index Cond: (a.thousand = b.f1) >>> (7 rows) > >> Sure, but that's still re-executing the bitmap index scan 51 times -- >> possibly >> having to fetch the same records off disk repeatedly.
sorry 5 times > It's not fetching any record repeatedly, because the HashAggregate > step eliminated duplicate keys on the other side. Only because it's an IN query. If it was a normal join which hash joins are perfectly capable of handling then it would be. As it is it could be fetching the same page repeatedly but not precisely the same tuples. It happens that transforming this query to explain analyze select * from tenk1 a where thousand = any (array(select f1 from int4_tbl b)); makes it run about 40% faster. That could just be that the hash is small enough that a linear search is more efficient than calling hashint4 though. (Perhaps we should be protecting against that in dynahash, actually) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers