subject:"hiveContext sql number of tasks"

Re: hiveContext sql number of tasks

2015-10-07 Thread Deng Ching-Mallete

Hi, You can do coalesce(N), where N is the number of partitions you want it reduced to, after loading the data into an RDD. HTH, Deng On Wed, Oct 7, 2015 at 6:34 PM, patcharee wrote: > Hi, > > I do a sql query on about 10,000 partitioned orc files. Because of the > partition schema the files c

hiveContext sql number of tasks

2015-10-07 Thread patcharee

Hi, I do a sql query on about 10,000 partitioned orc files. Because of the partition schema the files cannot be merged any longer (to reduce the total number). From this command hiveContext.sql(sqlText), the 10K tasks were created to handle each file. Is it possible to use less tasks? How to