Hi, You can do coalesce(N), where N is the number of partitions you want it reduced to, after loading the data into an RDD.
HTH, Deng On Wed, Oct 7, 2015 at 6:34 PM, patcharee <patcharee.thong...@uni.no> wrote: > Hi, > > I do a sql query on about 10,000 partitioned orc files. Because of the > partition schema the files cannot be merged any longer (to reduce the total > number). > > From this command hiveContext.sql(sqlText), the 10K tasks were created to > handle each file. Is it possible to use less tasks? How to force the spark > sql to use less tasks? > > BR, > Patcharee > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >