Thanks Doug,
coalesce might invoke a shuffle as well.
I don't think what I'm suggesting is a feature but it definitely should be.

Daniel

On Mon, Jul 20, 2015 at 4:15 PM, Doug Balog <d...@balog.net> wrote:

> Hi Daniel,
> Take a look at .coalesce()
> I’ve seen good results by coalescing to num executors * 10, but I’m still
> trying to figure out the
> optimal number of partitions per executor.
> To get the number of executors,
> sc.getConf.getInt(“spark.executor.instances”,-1)
>
>
> Cheers,
>
> Doug
>
> > On Jul 20, 2015, at 5:04 AM, Daniel Haviv <
> daniel.ha...@veracity-group.com> wrote:
> >
> > Hi,
> > My data is constructed from a lot of small files which results in a lot
> of partitions per RDD.
> > Is there some way to locally repartition the RDD without shuffling so
> that all of the partitions that reside on a specific node will become X
> partitions on the same node ?
> >
> > Thank you.
> > Daniel
>
>

Reply via email to