Re: repartition vs partitionby

Raghavendra Pandey Sat, 17 Oct 2015 05:58:56 -0700

You can use coalesce function, if you want to reduce the number of
partitions. This one minimizes the data shuffle.


-Raghav

On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashr...@icloud.com>
wrote:

> Hi folks
>
> I need to reparation large set of data around(300G) as i see some portions
> have large data(data skew)
>
> i have pairRDDs [({},{}),({},{}),({},{})]
>
> what is the best way to solve the the problem
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: repartition vs partitionby

Reply via email to