Hi,
We have encountered a scenario that when the reduce num changed from 30 to
31 along with the increasing source table data size, it caused a serious
unbalanced shuffling. Let me first explain the problem of using 31 as the
reduce num.
Say we have a sql group by 2 fields "a" and "b".
hashCode(a,b) = hashCode(a) + hashCode(b) * 31;
reduceNo = hashCode(a,b) % 31 = (hashCode(a) + hashCode(b) * 31) % 31 =
hashCode(a) % 31
If the value distribution of field "a" is unbalanced, it might cause a
unbalanced shuffling.
So I suggest never to use 31 as the reduce num. Any thoughts ?

Reply via email to