Runping Qi wrote:
Could it be the case that the latter jobs ran slower because the tasks took
longer time to get initialized?
If so, you may hit
https://issues.apache.org/jira/browse/HADOOP-4780
Runping
On Tue, Mar 3, 2009 at 2:02 PM, Sean Laurent <[email protected]>wrote:
Hrmmm. According to hadoop-defaults.xml,
mapred.jobtracker.completeuserjobs.maximum defaults to 100. So I tried
setting it to 1, but that had no effect. I still see each successive run
taking longer than the previous run.
1) Restart M/R
2) Run #1: 142.12 (secs)
3) Run #2 181.96 (secs)
4) Run #3 221.95 (secs)
5) Run #4 281.96 (secs)
Yeah. May be its not the problem with the JobTracker. Can you check (via
job history) what is the best and the worst task runtimes? You can
analyze the jobs after they complete.
Amar
I don't think that's the problem here... :(
-S
- Show quoted text -
On Tue, Mar 3, 2009 at 2:33 PM, Runping Qi <[email protected]> wrote:
The jobtracker's memory increased as you ran more and more jobs because
the
job tracker still kept some data about those completed jobs. The maximum
number of completed jobs kept is determined by the config variable
mapred.jobtracker.completeuserjobs.maximum.
You can lower that to lower the job tracker memory consumption.
On Tue, Mar 3, 2009 at 10:01 AM, Sean Laurent <[email protected]
wrote:
Interesting... from reading HADOOP-4766, I'm not entirely clear if
that
problem is related to the number of jobs or the number of tasks.
- I'm only running a single job with approximately 900 map tasks as
opposed
to the 500-600+ jobs and 100K tasks described in HADOOP-4766.
- I am seeing increased memory use on the JobTracker.
- I am seeing elevated memory use over time on the DataNode/TaskTracker
machines.
- Amar's description in HADOOP-4766 from December 6th sounds pretty
similar.
I also tried adjusting garbage collection via -XX:+UseParallelGC, but
that
had no noticeable impact.
It also wasn't clear to me what, if anything, I can do to fix or work
around
the problem.
Any advice would be greatly appreciated.
-Sean
- Show quoted text -
On Mon, Mar 2, 2009 at 7:50 PM, Runping Qi <[email protected]>
wrote:
Your problem may be related to
https://issues.apache.org/jira/browse/HADOOP-4766
Runping
On Mon, Mar 2, 2009 at 4:46 PM, Sean Laurent <
[email protected]
wrote:
Hi all,
I'm conducting some initial tests with Hadoop to better understand
how
well
it will handle and scale with some of our specific problems. As a
result,
I've written some M/R jobs that are representative of the work we
want
to
do. I then run the jobs multiple times in a row (sequentially) to
get
a
rough estimate for average run-time.
What I'm seeing is really strange... If I run the same job with the
same
inputs multiple times, each successive run is slower than the
previous
run.
If I restart the cluster and re-run the tests, the first run is
fast
and
then each successive run is slower.
For example, I just started the cluster and ran the same job 4
times.
The
run times for the jobs were as follows: 127 seconds, 177 seconds,
207
seconds and 218 seconds. I restarted HDFS and M/R, reran the job 3
more
times and got the following run times: 138 seconds, 187 seconds and
221
seconds. :(
The map task is pretty simple - parse XML files to extract specific
elements. I'm using Cascading and wrote a custom Scheme, which in
turn
uses
a custom FileInputFormat that treats each file as an entire record
(splitable = false). Each file is then treated as a separate map
task
with
no reduce step.
In this case I have a 8 node cluster. 1 node acts as a dedicated
NameNode/JobTracker and 7 nodes run the DataNode/TaskTracker. Each
machine
is identical: Dell 1950 with Intel quad-core 2.5, 8GB RAM and 2
250GB
SATA2
drives. All 8 machines are in the same rack running on a dedicated
Force10
gigabit switch.
I tried enabling JVM reuse via JobConf, which improved performance
for
the
initial few runs... but each successive job still took longer than
the
previous. I also tried increasing the maximum memory via the
mapred.child.java.opts property, but that didn't have any impact.
I checked the logs, but I don't see any errors.
Here's my basic list of configured properties:
fs.default.name=hdfs://dn01.hadoop.mycompany.com:9000
mapred.job.tracker=dn01.hadoop.mycompany.com:9001
dfs.replication=3
dfs.block.size=1048576
dfs.name.dir=/opt/hadoop/volume1/name,/opt/hadoop/volume2/name
dfs.data.dir=/opt/hadoop/volume1/data,/opt/hadoop/volume2/data
mapred.local.dir=/opt/hadoop/volume1/mapred,/opt/hadoop/volume2/mapred
mapred.child.java.opts=-Xmx1532m
Frankly I'm stumped. I'm sure there is something obvious that I'm
missing,
but I'm totally at a loss right now. Any suggestions would be
~greatly~
appreciated.