Spark SQL Optimization

gtinside Mon, 21 Mar 2016 06:27:24 -0700

Hi ,

I am trying to execute a simple query with join on 3 tables. When I look at
the execution plan , it varies with position of table in the "from" clause.
Execution plan looks more optimized when the position of table with
predicates is specified before any other table.



Original query :

select distinct pge.portfolio_code 
from table1 pge join table2 p
on p.perm_group = pge.anc_port_group 
join table3 uge
on p.user_group=uge.anc_user_group
where uge.user_name = 'user' and p.perm_type = 'TEST'

Optimized query (table with predicates is moved ahead):

select distinct pge.portfolio_code 
from table1 uge, table2 p, table3 pge 
where uge.user_name = 'user' and p.perm_type = 'TEST' 
and p.perm_group = pge.anc_port_group 
and p.user_group=uge.anc_user_group


Execution plan is more optimized for the optimized query and hence the query
executes faster. All the tables are being sourced from parquet files



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Optimization-tp26548.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark SQL Optimization

Reply via email to