[GENERAL] Re: [SQL] bad select performance fixed by forbidding hash joins

George Young Wed, 21 Jul 1999 07:36:24 -0700
[PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66]
table opset_steps
      (name text, id int2, ver int2) [1400 rows]
      non-unique index is on (id, ver)
 
table run_opsets
      (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows]
      pkey is (id, seq), second index on(status, id, ver, run_id)
      select count(*) from run_opsets where status=1; --> 187
      select count(*) from run_opsets where status=3; --> 10564
 
table runs
      (run_name text, run_id int2, status int2) [900 rows]
      pkey is run_name, second index(run_id, status)      
      select count(*)from runs where status=1; -->68
 
I have vacuum analyzed all relevant tables.

explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro 
where (ro.status=3 or ro.status=1) and ro.opset_id=os.opset_id and ro.run_id=r.run_id 
and ro.opset_ver=os.opset_ver and r.status=1;

Hash Join  (cost=1793.58 rows=14560 width=38)
  ->  Hash Join  (cost=1266.98 rows=14086 width=24)
        ->  Seq Scan on run_opsets ro  (cost=685.51 rows=13903 width=8)
        ->  Hash  (cost=70.84 rows=1389 width=16)
              ->  Seq Scan on opset_steps os  (cost=70.84 rows=1389 width=16)
  ->  Hash  (cost=47.43 rows=374 width=14)
        ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)

This query takes 16 seconds. [returns 3126 rows]

On Tue, Jul 20, 1999 at 05:42:20PM -0400, Tom Lane wrote:
> On Tue, 20 Jul 1999 14:56:46 -0400 George Young wrote:
> > ... Is this then
> > the best that postgres can do?  Is there some rephrasing/restructuring of
> > this query that would make it faster?
> 
> Hard to say.  The query looks reasonable as it stands --- 
> ...  You have no restriction
> clause on opset_steps so all of those entries get loaded for hashing;
> can you provide one?
No. 
> The system's plan looks pretty reasonable as well.  It might be that
> a merge join would be faster than a hash join but I wouldn't assume
> so.  If you want, you can try forcing the system not to use hashes;
> start psql with environment variable
>       PGOPTIONS="-fh"
> and see what sort of plan and execution time you get.  If that does
> turn out to be a big win it would indicate that the planner is using
> poor cost estimates, which is certainly possible...

Yes!  PGOPTIONS="-fh" made the query time go from 16 seconds to 2 seconds!
Is this a safe thing to leave on permanently, or is there some way to set
PGOPTIONS for just this query?

explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro 
where (ro.status=3 or ro.status=1) and ro.opset_id=os.opset_id and ro.run_id=r.run_id 
and ro.opset_ver=os.opset_ver and r.status=1;

Merge Join  (cost=9295.54 rows=14560 width=38)
  ->  Seq Scan  (cost=8676.01 rows=14371 width=22)
        ->  Sort  (cost=8676.01 rows=14371 width=22)
              ->  Merge Join  (cost=1657.30 rows=14371 width=22)
                    ->  Index Scan using run_opsets_pkey on run_opsets ro  
(cost=1031.25 rows=13903 width=8)
                    ->  Seq Scan  (cost=154.91 rows=374 width=14)
                          ->  Sort  (cost=154.91 rows=374 width=14)
                                ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)
  ->  Index Scan using opset_steps_idx_ver_id on opset_steps os  (cost=99.45 rows=1389 
width=16)


With PGOPTIONS=-fh, this query takes ~ 2 seconds! [returns 3126 rows]

-- 
George Young,  Rm. L-204                [EMAIL PROTECTED]
MIT Lincoln Laboratory
244 Wood St.
Lexington, Massachusetts  02420-9108    (781) 981-2756
[GENERAL] Re: [SQL] bad select performance fixed by forbidding hash joins

Reply via email to