maybe each of the file parts has many blocks?
did you try SparkContext.coalesce to reduce the number of partitions? can
be done w/ or w/o data-shuffle.

*Noam Barcay*
Developer // *Kenshoo*
*Office* +972 3 746-6500 *427 // *Mobile* +972 54 475-3142
__________________________________________
*www.Kenshoo.com* <http://kenshoo.com/>

On Wed, Jan 21, 2015 at 5:31 PM, Wang, Ningjun (LNG-NPV) <
ningjun.w...@lexisnexis.com> wrote:

>  Why sc.objectFile(…) return a Rdd with thousands of partitions?
>
>
>
> I save a rdd to file system using
>
>
>
> rdd.saveAsObjectFile(“file:///tmp/mydir”)
>
>
>
> Note that the rdd contains 7 millions object. I check the directory
> /tmp/mydir/, it contains 8 partitions
>
>
>
> part-00000  part-00002  part-00004  part-00006  _SUCCESS
>
> part-00001  part-00003  part-00005  part-00007
>
>
>
> I then load the rdd back using
>
>
>
> val rdd2 = sc.objectFile[LabeledPoint]( (“file:///tmp/mydir”, 8)
>
>
>
> I expect rdd2 to have 8 partitions. But from the master UI, I see that
> rdd2 has over 1000 partitions. This is very inefficient. How can I limit it
> to 8 partitions just like what is stored on the file system?
>
>
>
> Regards,
>
>
>
> *Ningjun Wang*
>
> Consulting Software Engineer
>
> LexisNexis
>
> 121 Chanlon Road
>
> New Providence, NJ 07974-1541
>
>
>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Reply via email to