Re: [GENERAL] How does PostgerSQL planner decide driving table

Emanuel Calvo Mon, 06 Oct 2014 09:28:03 -0700

El 06/10/14 02:38, Shingo horiuchi escribió:
>
> Dear All,
>
>  
>
> I have read about query tuning and attempt to check the impact of
>
> exchanging the driving table in Join condition.
>
>  
>
> To test Simple Join condition, I prepared Two tables.
>
> One is the table for employees and another is the table for departments.
>
> Employees table has the foreign key which referencing departments table.
>
>  
>
> The proportion of each table, filtered table and joined table is below:
>
> (E means employees table and D means departments table.)
>
> #rows in E                 #rows in D                #filtered rows in
> E                    #filtered rows in D                   #rows in E
> and D
>
> 10000                          490                             
> 1000                                              
> 245                                                 9800
>
>  
>
> After make the index on the filtered column, I tried the query:
>
> EXPLAIN ANALYZE
>
> SELECT   D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Employees E, Departments D
>
> WHERE E.Department_Id=D.Department_Id
>
>   AND E.Exempt_Flag='Y'
>
>   AND D.US_Based_Flag='Y'
>
> ;
>
>  
>
> Result was:
>
> Hash Join  (cost=8.85..241.59 rows=499 width=15) (actual
> time=0.105..2.052 rows=518 loops=1)
>
>    Hash Cond: (e.department_id = d.department_id)
>
>    ->  Seq Scan on employees e  (cost=0.00..209.00 rows=5000 width=17)
> (actual time=0.007..1.541 rows=5000 loops=1)
>
>          Filter: (exempt_flag = 'Y'::bpchar)
>
>          Rows Removed by Filter: 5000
>
>    ->  Hash  (cost=8.24..8.24 rows=49 width=14) (actual
> time=0.087..0.087 rows=49 loops=1)
>
>          Buckets: 1024  Batches: 1  Memory Usage: 3kB
>
>          ->  Bitmap Heap Scan on departments d  (cost=4.63..8.24
> rows=49 width=14) (actual time=0.069..0.078 rows=49 loops=1)
>
>                Recheck Cond: (us_based_flag = 'Y'::bpchar)
>
>                ->  Bitmap Index Scan on dept2_flg_idx 
> (cost=0.00..4.62 rows=49 width=0) (actual time=0.063..0.063 rows=49
> loops=1)
>
>                      Index Cond: (us_based_flag = 'Y'::bpchar)
>
> Total runtime: 2.095 ms
>



I would recommend to run an ANALYZE on both tables.

You can't exchange the seqscan table on that query due that it needs to
read ALL the
records on E table. If you seqscan D first, the engine needs to read
again all the E
records.

If you want all the employees across all departments, you'll always end
up reading all
the employees.

What you can do to other plans, is disabling enable_hashjoin.

> */ /*
>
> In order to exchange the driving table, I tried the query:
>
> EXPLAIN ANALYZE
>
> SELECT   D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Departments D, Employees E
>
> WHERE D.Department_Id=E.Department_Id
>
>   AND E.Exempt_Flag='Y'
>
>   AND D.US_Based_Flag='Y'
>
> ;
>
>  
>
> However, the result was same.
>
> I think this is because the query planner can optimizer the 2^nd
>  query based on table statistics of E and D.
>
> E being the larger number of records and has higher filtering rate so
> it continues to be driving table.
>
>  
>
> Now, I tried another test case to confirm my assumption.
>
> The proportion of another test case is different from above one.
>
> The proportion of each table, filtered table and joined table is below:
>
> (E means employees table and D means departments table.)
>
> #rows in E                 #rows in D                #filtered rows in
> E                    #filtered rows in D                   #rows in E
> and D
>
> 10000                         
> 490                              5000                                         
>      
> 49                                                  9800
>
> The important point is the difference in the filtering rate.
>
> In this case, departments table is higher filtering rate,
>
> so taking departments table as driving table will be able to cut the
> computational cost, I think.
>
>  
>
> I tried same query:
>
> EXPLAIN ANALYZE
>
> SELECT   D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Employees E, Departments D
>
> WHERE E.Department_Id=D.Department_Id
>
>   AND E.Exempt_Flag='Y'
>
>   AND D.US_Based_Flag='Y'
>
> And
>
> EXPLAIN ANALYZE
>
> SELECT   D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Employees E, Departments D
>
> WHERE E.Department_Id=D.Department_Id
>
>   AND E.Exempt_Flag='Y'
>
>   AND D.US_Based_Flag='Y'
>
>  
>

Both queries are identical (?).


-- 
--
Emanuel Calvo                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [GENERAL] How does PostgerSQL planner decide driving table

Reply via email to