It's helpful if you can include the output of EXPLAIN EXTENDED or df.explain(true) whenever asking about query performance.
On Mon, Mar 21, 2016 at 6:27 AM, gtinside <gtins...@gmail.com> wrote: > Hi , > > I am trying to execute a simple query with join on 3 tables. When I look at > the execution plan , it varies with position of table in the "from" clause. > Execution plan looks more optimized when the position of table with > predicates is specified before any other table. > > > Original query : > > select distinct pge.portfolio_code > from table1 pge join table2 p > on p.perm_group = pge.anc_port_group > join table3 uge > on p.user_group=uge.anc_user_group > where uge.user_name = 'user' and p.perm_type = 'TEST' > > Optimized query (table with predicates is moved ahead): > > select distinct pge.portfolio_code > from table1 uge, table2 p, table3 pge > where uge.user_name = 'user' and p.perm_type = 'TEST' > and p.perm_group = pge.anc_port_group > and p.user_group=uge.anc_user_group > > > Execution plan is more optimized for the optimized query and hence the > query > executes faster. All the tables are being sourced from parquet files > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Optimization-tp26548.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >