Hi all, I'm conducting some initial tests with Hadoop to better understand how well it will handle and scale with some of our specific problems. As a result, I've written some M/R jobs that are representative of the work we want to do. I then run the jobs multiple times in a row (sequentially) to get a rough estimate for average run-time.
What I'm seeing is really strange... If I run the same job with the same inputs multiple times, each successive run is slower than the previous run. If I restart the cluster and re-run the tests, the first run is fast and then each successive run is slower. For example, I just started the cluster and ran the same job 4 times. The run times for the jobs were as follows: 127 seconds, 177 seconds, 207 seconds and 218 seconds. I restarted HDFS and M/R, reran the job 3 more times and got the following run times: 138 seconds, 187 seconds and 221 seconds. :( The map task is pretty simple - parse XML files to extract specific elements. I'm using Cascading and wrote a custom Scheme, which in turn uses a custom FileInputFormat that treats each file as an entire record (splitable = false). Each file is then treated as a separate map task with no reduce step. In this case I have a 8 node cluster. 1 node acts as a dedicated NameNode/JobTracker and 7 nodes run the DataNode/TaskTracker. Each machine is identical: Dell 1950 with Intel quad-core 2.5, 8GB RAM and 2 250GB SATA2 drives. All 8 machines are in the same rack running on a dedicated Force10 gigabit switch. I tried enabling JVM reuse via JobConf, which improved performance for the initial few runs... but each successive job still took longer than the previous. I also tried increasing the maximum memory via the mapred.child.java.opts property, but that didn't have any impact. I checked the logs, but I don't see any errors. Here's my basic list of configured properties: fs.default.name=hdfs://dn01.hadoop.mycompany.com:9000 mapred.job.tracker=dn01.hadoop.mycompany.com:9001 dfs.replication=3 dfs.block.size=1048576 dfs.name.dir=/opt/hadoop/volume1/name,/opt/hadoop/volume2/name dfs.data.dir=/opt/hadoop/volume1/data,/opt/hadoop/volume2/data mapred.local.dir=/opt/hadoop/volume1/mapred,/opt/hadoop/volume2/mapred mapred.child.java.opts=-Xmx1532m Frankly I'm stumped. I'm sure there is something obvious that I'm missing, but I'm totally at a loss right now. Any suggestions would be ~greatly~ appreciated. Thanks! -Sean
