Hello, I wonder what is the state-of-art best practice to achieve best performance running complicated SQL query today in 2016? I am new to this topic and have read about
Hive on Tez Spark on Hive Spark SQL 2.0 (It seems Spark 2.0 supports complicated nest query) The documentation I read suggests Hive on Tez is the winner. But I wonder whether the new Spark 2.0 improves the performance and becomes the new champion? And FYI, the query I am running is complicated nested query, for instance: with CTEA as ( select hsld_dmi as hsld_dmi_A, cycle_num as cycle_num_A from TableA where stat_fl=0.0 ), CTEB as ( select hsld_dmi as hsld_dmi_B, cycle_num as cycle_num_B from TableB O where ( O.hsld_dmi in ( select distinct hsld_dmi from TableC T where T.cycle_num=282.0 and T.stat_fl=1.0 ) and O.stat_fl =0.0 ) ) select cycle_num_B, count(distinct hsld_dmi_B) as attrition_count from CTEB left join CTEA on ( CTEB.hsld_dmi_B = CTEA.hsld_dmi_A and CTEB.cycle_num_B = CTEA.cycle_num_A+1 ) where CTEA.hsld_dmi_A is NULL group by CTEB.cycle_num_B order by CTEB.cycle_num_B; Does Spark 2.0 support this type query? Best, Shi