Hi Stephen, That will set the maximum heap allowable, but doesn't tell Hadoop's internal systems necessarily to take advantage of it. There's a number of other settings that adjust performance. At Cloudera we have a config tool that generates Hadoop configurations with reasonable first-approximation values for your cluster -- check out http://my.cloudera.com and look at the hadoop-site.xml it generates. If you start from there you might find a better parameter space to explore. Please share back your findings -- we'd love to tweak the tool even more with some external feedback :)
- Aaron On Wed, Jun 10, 2009 at 7:39 AM, stephen mulcahy <[email protected]>wrote: > Hi, > > I'm currently doing some testing of different configurations using the > Hadoop Sort as follows, > > bin/hadoop jar hadoop-*-examples.jar randomwriter > -Dtest.randomwrite.total_bytes=107374182400 /benchmark100 > > bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort > > The only changes I've made from the standard config are the following in > conf/mapred-site.xml > > <property> > <name>mapred.child.java.opts</name> > <value>-Xmx1024M</value> > </property> > > <property> > <name>mapred.tasktracker.map.tasks.maximum</name> > <value>8</value> > </property> > > <property> > <name>mapred.tasktracker.reduce.tasks.maximum</name> > <value>4</value> > </property> > > I'm running this on 4 systems, each with 8 processor cores and 4 separate > disks. > > Is there anything else I should change to stress memory more? The systems > in questions have 16GB of memory but the most thats getting used during a > run of this benchmark is about 2GB (and most of that seems to be os > caching). > > Thanks, > > -stephen > > -- > Stephen Mulcahy, DI2, Digital Enterprise Research Institute, > NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland > http://di2.deri.ie http://webstar.deri.ie http://sindice.com >
