Can you include the output of `explain()` for each of the runs? On Tue, Sep 1, 2015 at 1:06 AM, patcharee <patcharee.thong...@uni.no> wrote:
> Hi, > > I found spark 1.5 sorting is very slow compared to spark 1.4. Below is my > code snippet > > val sqlRDD = sql("select date, u, v, z from fino3_hr3 where zone == 2 > and z >= 2 and z <= order by date, z") > println("sqlRDD " + sqlRDD.count()) > > The fino3_hr3 (in the sql command) is a hive table in orc format, > partitioned by zone and z. > > Spark 1.5 takes 4.5 mins to execute this sql, while spark 1.4 takes 1.5 > mins. I noticed that dissimilar to spark 1.4 when spark 1.5 sorted, data > was shuffled into few tasks, not divided for all tasks. Do I need to set > any configuration explicitly? Any suggestions? > > BR, > Patcharee > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >