Owen, one problem with Arun's slide deck is that while it lists the parameters that matter, it doesn't list suggested values for them. Do you have any guide about that? In particular, the only places I know that talk about how to set these parameters are http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/and http://wiki.apache.org/hadoop/FAQ#3.
On Wed, Jun 10, 2009 at 12:14 PM, Owen O'Malley <[email protected]> wrote: > Take a look at Arun's slide deck on Hadoop performance: > > http://bit.ly/EDCg3 > > It is important to get io.sort.mb large enough, the io.sort.factor should > be closer to 100 instead of 10. I'd also use large block sizes to reduce the > number of maps. Please see the deck for other important factors. > > -- Owen >
