Hi, Plz use DataFrame#repartition.
On Tue, Feb 9, 2016 at 7:30 AM, Cesar Flores <ces...@gmail.com> wrote: > > I have a data frame which I sort using orderBy function. This operation > causes my data frame to go to a single partition. After using those > results, I would like to re-partition to a larger number of partitions. > Currently I am just doing: > > val rdd = df.rdd.coalesce(100, true) //df is a dataframe with a single > partition and around 14 million records > val newDF = hc.createDataFrame(rdd, df.schema) > > This process is really slow. Is there any other way of achieving this > task, or to optimize it (perhaps tweaking a spark configuration parameter)? > > > Thanks a lot > -- > Cesar Flores > -- --- Takeshi Yamamuro