spread data across partitions evenly. In most cases calling repartition is
enough to solve the problem. If you have a special case you might need
create your own custom partitioner.
Pedro
On Thu, Jun 16, 2016 at 6:55 PM, Selvam Raman wrote:
> Hi,
>
> What is skew data.
>
> I read t
Hi,
What is skew data.
I read that if the data was skewed while joining it would take long time to
finish the job.(99 percent finished in seconds where 1 percent of task
taking minutes to hour).
How to handle skewed data in spark.
Thanks,
Selvam R
+91-97877-87724