Re: Jobs run slower and slower

Amar Kamat Tue, 03 Mar 2009 20:15:31 -0800

Runping Qi wrote:

Could it be the case that the latter jobs ran slower because the tasks took
longer time to get initialized?
If so, you may hit
https://issues.apache.org/jira/browse/HADOOP-4780


Runping

On Tue, Mar 3, 2009 at 2:02 PM, Sean Laurent <[email protected]>wrote:

Hrmmm. According to hadoop-defaults.xml,
mapred.jobtracker.completeuserjobs.maximum defaults to 100. So I tried
setting it to 1, but that had no effect. I still see each successive run
taking longer than the previous run.

1) Restart M/R
2) Run #1: 142.12 (secs)
3) Run #2 181.96 (secs)
4) Run #3  221.95 (secs)
5) Run #4  281.96 (secs)

Yeah. May be its not the problem with the JobTracker. Can you check (viajob history) what is the best and the worst task runtimes? You cananalyze the jobs after they complete.

Amar

I don't think that's the problem here... :(

-S
- Show quoted text -

On Tue, Mar 3, 2009 at 2:33 PM, Runping Qi <[email protected]> wrote:

The jobtracker's memory increased as you ran more and more jobs because

the

job tracker still kept some data about those completed jobs. The maximum
number of completed jobs kept is determined by the config variable
mapred.jobtracker.completeuserjobs.maximum.
You can lower that to lower the job tracker memory consumption.


On Tue, Mar 3, 2009 at 10:01 AM, Sean Laurent <[email protected]

wrote:
Interesting... from reading HADOOP-4766, I'm not entirely clear if

that

problem is related to the number of jobs or the number of tasks.

- I'm only running a single job with approximately 900 map tasks as

opposed

to the 500-600+ jobs and 100K tasks described in HADOOP-4766.
- I am seeing increased memory use on the JobTracker.
- I am seeing elevated memory use over time on the DataNode/TaskTracker
machines.
- Amar's description in HADOOP-4766 from December 6th sounds pretty
similar.

I also tried adjusting garbage collection via -XX:+UseParallelGC, but

that

had no noticeable impact.

It also wasn't clear to me what, if anything, I can do to fix or work
around
the problem.

Any advice would be greatly appreciated.

-Sean
- Show quoted text -

On Mon, Mar 2, 2009 at 7:50 PM, Runping Qi <[email protected]>

wrote:

Your problem may be related to
https://issues.apache.org/jira/browse/HADOOP-4766

Runping


On Mon, Mar 2, 2009 at 4:46 PM, Sean Laurent <

[email protected]

wrote:

Hi all,

I'm conducting some initial tests with Hadoop to better understand

how

well

it will handle and scale with some of our specific problems. As a

result,

I've written some M/R jobs that are representative of the work we

want

to

do. I then run the jobs multiple times in a row (sequentially) to

get

rough estimate for average run-time.

What I'm seeing is really strange... If I run the same job with the

same

inputs multiple times, each successive run is slower than the

previous

run.

If I restart the cluster and re-run the tests, the first run is

fast

and

then each successive run is slower.

For example, I just started the cluster and ran the same job 4

times.

The

run times for the jobs were as follows: 127 seconds, 177 seconds,

seconds and 218 seconds. I restarted HDFS and M/R, reran the job 3

more

times and got the following run times: 138 seconds, 187 seconds and

seconds. :(

The map task is pretty simple - parse XML files to extract specific
elements. I'm using Cascading and wrote a custom Scheme, which in

turn

uses

a custom FileInputFormat that treats each file as an entire record
(splitable = false). Each file is then treated as a separate map

task

with

no reduce step.

In this case I have a 8 node cluster. 1 node acts as a dedicated
NameNode/JobTracker and 7 nodes run the DataNode/TaskTracker. Each

machine

is identical: Dell 1950 with Intel quad-core 2.5, 8GB RAM and 2

250GB

SATA2

drives. All 8 machines are in the same rack running on a dedicated

Force10

gigabit switch.

I tried enabling JVM reuse via JobConf, which improved performance

for

the

initial few runs... but each successive job still took longer than

the

previous. I also tried increasing the maximum memory via the
mapred.child.java.opts property, but that didn't have any impact.

I checked the logs, but I don't see any errors.

Here's my basic list of configured properties:

fs.default.name=hdfs://dn01.hadoop.mycompany.com:9000
mapred.job.tracker=dn01.hadoop.mycompany.com:9001
dfs.replication=3
dfs.block.size=1048576
dfs.name.dir=/opt/hadoop/volume1/name,/opt/hadoop/volume2/name
dfs.data.dir=/opt/hadoop/volume1/data,/opt/hadoop/volume2/data

mapred.local.dir=/opt/hadoop/volume1/mapred,/opt/hadoop/volume2/mapred

mapred.child.java.opts=-Xmx1532m

Frankly I'm stumped. I'm sure there is something obvious that I'm

missing,

but I'm totally at a loss right now. Any suggestions would be

~greatly~

appreciated.

Re: Jobs run slower and slower

Reply via email to