yes i know about that,its in case to reduce partitions. the point here is the data is skewed to few partitions..
On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > You can use coalesce function, if you want to reduce the number of > partitions. This one minimizes the data shuffle. > > -Raghav > > On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashr...@icloud.com> > wrote: > >> Hi folks >> >> I need to reparation large set of data around(300G) as i see some >> portions have large data(data skew) >> >> i have pairRDDs [({},{}),({},{}),({},{})] >> >> what is the best way to solve the the problem >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- with Regards Shahid Ashraf