The jobtracker's memory increased as you ran more and more jobs because the job tracker still kept some data about those completed jobs. The maximum number of completed jobs kept is determined by the config variable mapred.jobtracker.completeuserjobs.maximum. You can lower that to lower the job tracker memory consumption.
On Tue, Mar 3, 2009 at 10:01 AM, Sean Laurent <[email protected]>wrote: > Interesting... from reading HADOOP-4766, I'm not entirely clear if that > problem is related to the number of jobs or the number of tasks. > > - I'm only running a single job with approximately 900 map tasks as opposed > to the 500-600+ jobs and 100K tasks described in HADOOP-4766. > - I am seeing increased memory use on the JobTracker. > - I am seeing elevated memory use over time on the DataNode/TaskTracker > machines. > - Amar's description in HADOOP-4766 from December 6th sounds pretty > similar. > > I also tried adjusting garbage collection via -XX:+UseParallelGC, but that > had no noticeable impact. > > It also wasn't clear to me what, if anything, I can do to fix or work > around > the problem. > > Any advice would be greatly appreciated. > > -Sean > - Show quoted text - > > On Mon, Mar 2, 2009 at 7:50 PM, Runping Qi <[email protected]> wrote: > > > Your problem may be related to > > https://issues.apache.org/jira/browse/HADOOP-4766 > > > > Runping > > > > > > On Mon, Mar 2, 2009 at 4:46 PM, Sean Laurent <[email protected] > > >wrote: > > > > > Hi all, > > > I'm conducting some initial tests with Hadoop to better understand how > > well > > > it will handle and scale with some of our specific problems. As a > result, > > > I've written some M/R jobs that are representative of the work we want > to > > > do. I then run the jobs multiple times in a row (sequentially) to get a > > > rough estimate for average run-time. > > > > > > What I'm seeing is really strange... If I run the same job with the > same > > > inputs multiple times, each successive run is slower than the previous > > run. > > > If I restart the cluster and re-run the tests, the first run is fast > and > > > then each successive run is slower. > > > > > > For example, I just started the cluster and ran the same job 4 times. > The > > > run times for the jobs were as follows: 127 seconds, 177 seconds, 207 > > > seconds and 218 seconds. I restarted HDFS and M/R, reran the job 3 more > > > times and got the following run times: 138 seconds, 187 seconds and 221 > > > seconds. :( > > > > > > The map task is pretty simple - parse XML files to extract specific > > > elements. I'm using Cascading and wrote a custom Scheme, which in turn > > uses > > > a custom FileInputFormat that treats each file as an entire record > > > (splitable = false). Each file is then treated as a separate map task > > with > > > no reduce step. > > > > > > In this case I have a 8 node cluster. 1 node acts as a dedicated > > > NameNode/JobTracker and 7 nodes run the DataNode/TaskTracker. Each > > machine > > > is identical: Dell 1950 with Intel quad-core 2.5, 8GB RAM and 2 250GB > > SATA2 > > > drives. All 8 machines are in the same rack running on a dedicated > > Force10 > > > gigabit switch. > > > > > > I tried enabling JVM reuse via JobConf, which improved performance for > > the > > > initial few runs... but each successive job still took longer than the > > > previous. I also tried increasing the maximum memory via the > > > mapred.child.java.opts property, but that didn't have any impact. > > > > > > I checked the logs, but I don't see any errors. > > > > > > Here's my basic list of configured properties: > > > > > > fs.default.name=hdfs://dn01.hadoop.mycompany.com:9000 > > > mapred.job.tracker=dn01.hadoop.mycompany.com:9001 > > > dfs.replication=3 > > > dfs.block.size=1048576 > > > dfs.name.dir=/opt/hadoop/volume1/name,/opt/hadoop/volume2/name > > > dfs.data.dir=/opt/hadoop/volume1/data,/opt/hadoop/volume2/data > > > mapred.local.dir=/opt/hadoop/volume1/mapred,/opt/hadoop/volume2/mapred > > > mapred.child.java.opts=-Xmx1532m > > > > > > Frankly I'm stumped. I'm sure there is something obvious that I'm > > missing, > > > but I'm totally at a loss right now. Any suggestions would be ~greatly~ > > > appreciated. > > > > > > Thanks! > > > > > > -Sean > > >
