Could it be the case that the latter jobs ran slower because the tasks took longer time to get initialized? If so, you may hit https://issues.apache.org/jira/browse/HADOOP-4780
Runping On Tue, Mar 3, 2009 at 2:02 PM, Sean Laurent <[email protected]>wrote: > Hrmmm. According to hadoop-defaults.xml, > mapred.jobtracker.completeuserjobs.maximum defaults to 100. So I tried > setting it to 1, but that had no effect. I still see each successive run > taking longer than the previous run. > > 1) Restart M/R > 2) Run #1: 142.12 (secs) > 3) Run #2 181.96 (secs) > 4) Run #3 221.95 (secs) > 5) Run #4 281.96 (secs) > > I don't think that's the problem here... :( > > -S > - Show quoted text - > > On Tue, Mar 3, 2009 at 2:33 PM, Runping Qi <[email protected]> wrote: > > > The jobtracker's memory increased as you ran more and more jobs because > the > > job tracker still kept some data about those completed jobs. The maximum > > number of completed jobs kept is determined by the config variable > > mapred.jobtracker.completeuserjobs.maximum. > > You can lower that to lower the job tracker memory consumption. > > > > > > On Tue, Mar 3, 2009 at 10:01 AM, Sean Laurent <[email protected] > > >wrote: > > > > > Interesting... from reading HADOOP-4766, I'm not entirely clear if > that > > > problem is related to the number of jobs or the number of tasks. > > > > > > - I'm only running a single job with approximately 900 map tasks as > > opposed > > > to the 500-600+ jobs and 100K tasks described in HADOOP-4766. > > > - I am seeing increased memory use on the JobTracker. > > > - I am seeing elevated memory use over time on the DataNode/TaskTracker > > > machines. > > > - Amar's description in HADOOP-4766 from December 6th sounds pretty > > > similar. > > > > > > I also tried adjusting garbage collection via -XX:+UseParallelGC, but > > that > > > had no noticeable impact. > > > > > > It also wasn't clear to me what, if anything, I can do to fix or work > > > around > > > the problem. > > > > > > Any advice would be greatly appreciated. > > > > > > -Sean > > > - Show quoted text - > > > > > > On Mon, Mar 2, 2009 at 7:50 PM, Runping Qi <[email protected]> > wrote: > > > > > > > Your problem may be related to > > > > https://issues.apache.org/jira/browse/HADOOP-4766 > > > > > > > > Runping > > > > > > > > > > > > On Mon, Mar 2, 2009 at 4:46 PM, Sean Laurent < > [email protected] > > > > >wrote: > > > > > > > > > Hi all, > > > > > I'm conducting some initial tests with Hadoop to better understand > > how > > > > well > > > > > it will handle and scale with some of our specific problems. As a > > > result, > > > > > I've written some M/R jobs that are representative of the work we > > want > > > to > > > > > do. I then run the jobs multiple times in a row (sequentially) to > get > > a > > > > > rough estimate for average run-time. > > > > > > > > > > What I'm seeing is really strange... If I run the same job with the > > > same > > > > > inputs multiple times, each successive run is slower than the > > previous > > > > run. > > > > > If I restart the cluster and re-run the tests, the first run is > fast > > > and > > > > > then each successive run is slower. > > > > > > > > > > For example, I just started the cluster and ran the same job 4 > times. > > > The > > > > > run times for the jobs were as follows: 127 seconds, 177 seconds, > 207 > > > > > seconds and 218 seconds. I restarted HDFS and M/R, reran the job 3 > > more > > > > > times and got the following run times: 138 seconds, 187 seconds and > > 221 > > > > > seconds. :( > > > > > > > > > > The map task is pretty simple - parse XML files to extract specific > > > > > elements. I'm using Cascading and wrote a custom Scheme, which in > > turn > > > > uses > > > > > a custom FileInputFormat that treats each file as an entire record > > > > > (splitable = false). Each file is then treated as a separate map > task > > > > with > > > > > no reduce step. > > > > > > > > > > In this case I have a 8 node cluster. 1 node acts as a dedicated > > > > > NameNode/JobTracker and 7 nodes run the DataNode/TaskTracker. Each > > > > machine > > > > > is identical: Dell 1950 with Intel quad-core 2.5, 8GB RAM and 2 > 250GB > > > > SATA2 > > > > > drives. All 8 machines are in the same rack running on a dedicated > > > > Force10 > > > > > gigabit switch. > > > > > > > > > > I tried enabling JVM reuse via JobConf, which improved performance > > for > > > > the > > > > > initial few runs... but each successive job still took longer than > > the > > > > > previous. I also tried increasing the maximum memory via the > > > > > mapred.child.java.opts property, but that didn't have any impact. > > > > > > > > > > I checked the logs, but I don't see any errors. > > > > > > > > > > Here's my basic list of configured properties: > > > > > > > > > > fs.default.name=hdfs://dn01.hadoop.mycompany.com:9000 > > > > > mapred.job.tracker=dn01.hadoop.mycompany.com:9001 > > > > > dfs.replication=3 > > > > > dfs.block.size=1048576 > > > > > dfs.name.dir=/opt/hadoop/volume1/name,/opt/hadoop/volume2/name > > > > > dfs.data.dir=/opt/hadoop/volume1/data,/opt/hadoop/volume2/data > > > > > > > mapred.local.dir=/opt/hadoop/volume1/mapred,/opt/hadoop/volume2/mapred > > > > > mapred.child.java.opts=-Xmx1532m > > > > > > > > > > Frankly I'm stumped. I'm sure there is something obvious that I'm > > > > missing, > > > > > but I'm totally at a loss right now. Any suggestions would be > > ~greatly~ > > > > > appreciated. > > >
