Re: Jobs run slower and slower

Sean Laurent Tue, 03 Mar 2009 10:02:17 -0800

Interesting... from reading HADOOP-4766, I'm  not entirely clear if that
problem is related to the number of jobs or the number of tasks.


- I'm only running a single job with approximately 900 map tasks as opposed
to the 500-600+ jobs and 100K tasks described in HADOOP-4766.
- I am seeing increased memory use on the JobTracker.
- I am seeing elevated memory use over time on the DataNode/TaskTracker
machines.
- Amar's description in HADOOP-4766 from December 6th sounds pretty similar.

I also tried adjusting garbage collection via -XX:+UseParallelGC, but that
had no noticeable impact.

It also wasn't clear to me what, if anything, I can do to fix or work around
the problem.

Any advice would be greatly appreciated.

-Sean

On Mon, Mar 2, 2009 at 7:50 PM, Runping Qi <[email protected]> wrote:

> Your problem may be related to
> https://issues.apache.org/jira/browse/HADOOP-4766
>
> Runping
>
>
> On Mon, Mar 2, 2009 at 4:46 PM, Sean Laurent <[email protected]
> >wrote:
>
> > Hi all,
> > I'm conducting some initial tests with Hadoop to better understand how
> well
> > it will handle and scale with some of our specific problems. As a result,
> > I've written some M/R jobs that are representative of the work we want to
> > do. I then run the jobs multiple times in a row (sequentially) to get a
> > rough estimate for average run-time.
> >
> > What I'm seeing is really strange... If I run the same job with the same
> > inputs multiple times, each successive run is slower than the previous
> run.
> > If I restart the cluster and re-run the tests, the first run is fast and
> > then each successive run is slower.
> >
> > For example, I just started the cluster and ran the same job 4 times. The
> > run times for the jobs were as follows: 127 seconds, 177 seconds, 207
> > seconds and 218 seconds. I restarted HDFS and M/R, reran the job 3 more
> > times and got the following run times: 138 seconds, 187 seconds and 221
> > seconds. :(
> >
> > The map task is pretty simple - parse XML files to extract specific
> > elements. I'm using Cascading and wrote a custom Scheme, which in turn
> uses
> > a custom FileInputFormat that treats each file as an entire record
> > (splitable = false). Each file is then treated as a separate map task
> with
> > no reduce step.
> >
> > In this case I have a 8 node cluster. 1 node acts as a dedicated
> > NameNode/JobTracker and 7 nodes run the DataNode/TaskTracker. Each
> machine
> > is identical: Dell 1950 with Intel quad-core 2.5, 8GB RAM and 2 250GB
> SATA2
> > drives. All 8 machines are in the same rack running on a dedicated
> Force10
> > gigabit switch.
> >
> > I tried enabling JVM reuse via JobConf, which improved performance for
> the
> > initial few runs... but each successive job still took longer than the
> > previous. I also tried increasing the maximum memory via the
> > mapred.child.java.opts property, but that didn't have any impact.
> >
> > I checked the logs, but I don't see any errors.
> >
> > Here's my basic list of configured properties:
> >
> > fs.default.name=hdfs://dn01.hadoop.mycompany.com:9000
> > mapred.job.tracker=dn01.hadoop.mycompany.com:9001
> > dfs.replication=3
> > dfs.block.size=1048576
> > dfs.name.dir=/opt/hadoop/volume1/name,/opt/hadoop/volume2/name
> > dfs.data.dir=/opt/hadoop/volume1/data,/opt/hadoop/volume2/data
> > mapred.local.dir=/opt/hadoop/volume1/mapred,/opt/hadoop/volume2/mapred
> > mapred.child.java.opts=-Xmx1532m
> >
> > Frankly I'm stumped. I'm sure there is something obvious that I'm
> missing,
> > but I'm totally at a loss right now. Any suggestions would be ~greatly~
> > appreciated.
> >
> > Thanks!
> >
> > -Sean
>

Re: Jobs run slower and slower

Reply via email to