Hi , I am trying to execute a simple query with join on 3 tables. When I look at the execution plan , it varies with position of table in the "from" clause. Execution plan looks more optimized when the position of table with predicates is specified before any other table.
Original query : select distinct pge.portfolio_code from table1 pge join table2 p on p.perm_group = pge.anc_port_group join table3 uge on p.user_group=uge.anc_user_group where uge.user_name = 'user' and p.perm_type = 'TEST' Optimized query (table with predicates is moved ahead): select distinct pge.portfolio_code from table1 uge, table2 p, table3 pge where uge.user_name = 'user' and p.perm_type = 'TEST' and p.perm_group = pge.anc_port_group and p.user_group=uge.anc_user_group Execution plan is more optimized for the optimized query and hence the query executes faster. All the tables are being sourced from parquet files -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Optimization-tp26548.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org