Re: repartition vs partitionby

shahid ashraf Sat, 17 Oct 2015 06:15:04 -0700

yes i know about that,its in case to reduce partitions. the point here is
the data is skewed to few partitions..



On Sat, Oct 17, 2015 at 6:27 PM, Raghavendra Pandey <
[email protected]> wrote:

> You can use coalesce function, if you want to reduce the number of
> partitions. This one minimizes the data shuffle.
>
> -Raghav
>
> On Sat, Oct 17, 2015 at 1:02 PM, shahid qadri <[email protected]>
> wrote:
>
>> Hi folks
>>
>> I need to reparation large set of data around(300G) as i see some
>> portions have large data(data skew)
>>
>> i have pairRDDs [({},{}),({},{}),({},{})]
>>
>> what is the best way to solve the the problem
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>


-- 
with Regards
Shahid Ashraf

Re: repartition vs partitionby

Reply via email to