Hi,
You can do coalesce(N), where N is the number of partitions you want it
reduced to, after loading the data into an RDD.
HTH,
Deng
On Wed, Oct 7, 2015 at 6:34 PM, patcharee wrote:
> Hi,
>
> I do a sql query on about 10,000 partitioned orc files. Because of the
> partition schema the files c
Hi,
I do a sql query on about 10,000 partitioned orc files. Because of the
partition schema the files cannot be merged any longer (to reduce the
total number).
From this command hiveContext.sql(sqlText), the 10K tasks were created
to handle each file. Is it possible to use less tasks? How to