yes i am trying to do so. but it will try to repartition whole data.. can't we split a large partition(data skewed partition) into multiple partitions (any idea on this.).
On Sun, Oct 18, 2015 at 1:55 AM, Adrian Tanase <atan...@adobe.com> wrote: > If the dataset allows it you can try to write a custom partitioner to help > spark distribute the data more uniformly. > > Sent from my iPhone > > On 17 Oct 2015, at 16:14, shahid ashraf <sha...@trialx.com> wrote: > > yes i know about that,its in case to reduce partitions. the point here is > the data is skewed to few partitions.. > > > On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pandey < > raghavendra.pan...@gmail.com> wrote: > >> You can use coalesce function, if you want to reduce the number of >> partitions. This one minimizes the data shuffle. >> >> -Raghav >> >> On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <shahidashr...@icloud.com> >> wrote: >> >>> Hi folks >>> >>> I need to reparation large set of data around(300G) as i see some >>> portions have large data(data skew) >>> >>> i have pairRDDs [({},{}),({},{}),({},{})] >>> >>> what is the best way to solve the the problem >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> > > > -- > with Regards > Shahid Ashraf > > -- with Regards Shahid Ashraf