Hi Neal,
spark.sql.shuffle.partitions is the property to control the number of tasks
after shuffle (to generate t2, there is a shuffle for the aggregations
specified by groupBy and agg.) You can use
sqlContext.setConf("spark.sql.shuffle.partitions", "newNumber") or
sqlContext.sql("set spark.sql.sh
I have some trouble to control number of spark tasks for a stage. This on
latest spark 1.3.x source code build.
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
sc.getConf.get("spark.default.parallelism") -> setup to 10
val t1 = hiveContext.sql("FROM SalesJan2009 select * ")
val