Re: Skew data

2016-06-17 Thread Pedro Rodriguez
spread data across partitions evenly. In most cases calling repartition is enough to solve the problem. If you have a special case you might need create your own custom partitioner. Pedro On Thu, Jun 16, 2016 at 6:55 PM, Selvam Raman wrote: > Hi, > > What is skew data. > > I read t

Skew data

2016-06-16 Thread Selvam Raman
Hi, What is skew data. I read that if the data was skewed while joining it would take long time to finish the job.(99 percent finished in seconds where 1 percent of task taking minutes to hour). How to handle skewed data in spark. Thanks, Selvam R +91-97877-87724