You can also so something similar to what is mentioned in [1]. The basic idea is to use two hash functions for each key and assigning it to the least loaded of the two hashed worker.
Cheers, Anis [1]. https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf On Fri, 17 Feb 2017 at 01:34, Yong Zhang <[email protected]> wrote: > Yes. You have to change your key, or as BigData term, "adding salt". > > > Yong > > ------------------------------ > *From:* Gourav Sengupta <[email protected]> > *Sent:* Thursday, February 16, 2017 11:11 AM > *To:* user > *Subject:* skewed data in join > > Hi, > > Is there a way to do multiple reducers for joining on skewed data? > > Regards, > Gourav >
