You can use coalesce function, if you want to reduce the number of partitions. This one minimizes the data shuffle.
-Raghav On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashr...@icloud.com> wrote: > Hi folks > > I need to reparation large set of data around(300G) as i see some portions > have large data(data skew) > > i have pairRDDs [({},{}),({},{}),({},{})] > > what is the best way to solve the the problem > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >