It does not necessarily shuffle, yes. I believe it will not if you are
strictly reducing the number of partitions, and do not force a
shuffle. So I think the answer is 'yes'.
If you have a huge number of small files, you can also consider
wholeTextFiles, which gives you entire files of content in
Rdd.coalesce(1) will coalesce RDD and give only one output file.
coalesce(2) will give 2 wise versa.
On Jan 23, 2015 4:58 AM, "Sean Owen" wrote:
> One output file is produced per partition. If you want fewer, use
> coalesce() before saving the RDD.
>
> On Thu, Jan 22, 2015 at 10:46 PM, Kane Kim
One output file is produced per partition. If you want fewer, use
coalesce() before saving the RDD.
On Thu, Jan 22, 2015 at 10:46 PM, Kane Kim wrote:
> How I can reduce number of output files? Is there a parameter to
> saveAsTextFile?
>
> Thanks.
>
> -