On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23j...@gmail.com> wrote: > Hi, > > Please see the following plan: > > postgres=# explain select * from small_table left outer join big_table > using (id); > QUERY PLAN > > > ---------------------------------------------------------------------------- > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12) > Hash Cond: (small_table.id = big_table.id) > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8) > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8) > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000 > width=8) > (5 rows) > > Here I have a puzzle, why not choose the small table to build hash table? > It can avoid multiple batches thus save significant I/O cost, isn't it? > > We can perform this query in two phases: > 1) inner join, using the small table to build hash table. > 2) check whether each tuple in the hash table has matches before, which can > be done with another flag bit > > The only compromise is the output order, due to the two separate phases. > Not sure whether the SQL standard requires it. > > SQL standard does not require the result to be in any particular order unless an ORDER BY is used.
Regards, -- gurjeet.singh @ EnterpriseDB - The Enterprise Postgres Company http://www.EnterpriseDB.com singh.gurj...@{ gmail | yahoo }.com Twitter/Skype: singh_gurjeet Mail sent from my BlackLaptop device