On Mar 17, 2009, at 9:18 AM, Richa Khandelwal wrote:
I was going through FAQs on Hadoop to optimize the performance of
map/reduce. There is a suggestion to set the number of reducers to a
prime
number closest to the number of nodes and number of mappers a prime
number
closest to several times the number of nodes in the cluster.
There is no need for the number of reduces to be prime. The only thing
it helps is if you are using the HashPartitioner and your key's hash
function is too linear. In practice, you usually want to use 99% of
your reduce capacity of the cluster.
-- Owen