Re: Optimal way to re-partition from a single partition

Takeshi Yamamuro Mon, 08 Feb 2016 22:45:07 -0800

Hi,

Plz use DataFrame#repartition.


On Tue, Feb 9, 2016 at 7:30 AM, Cesar Flores <ces...@gmail.com> wrote:

>
> I have a data frame which I sort using orderBy function. This operation
> causes my data frame to go to a single partition. After using those
> results, I would like to re-partition to a larger number of partitions.
> Currently I am just doing:
>
> val rdd = df.rdd.coalesce(100, true) //df is a dataframe with a single
> partition and around 14 million records
> val newDF =  hc.createDataFrame(rdd, df.schema)
>
> This process is really slow. Is there any other way of achieving this
> task, or to optimize it (perhaps tweaking a spark configuration parameter)?
>
>
> Thanks a lot
> --
> Cesar Flores
>



-- 
---
Takeshi Yamamuro

Re: Optimal way to re-partition from a single partition

Reply via email to