Mark, There is a setup price when using Hadoop, for each task a new JVM must be spawned. On such a small scale, you won't see any good using MR.
J-D On Mon, Apr 20, 2009 at 12:26 AM, Mark Kerzner <[email protected]> wrote: > Hi, > > I ran a Hadoop MapReduce task in the local mode, reading and writing from > HDFS, and it took 2.5 minutes. Essentially the same operations on the local > file system without MapReduce took 1/2 minute. Is this to be expected? > > It seemed that the system lost most of the time in the MapReduce operation, > such as after these messages > > 09/04/19 23:23:01 INFO mapred.LocalJobRunner: reduce > reduce > 09/04/19 23:23:01 INFO mapred.JobClient: map 100% reduce 92% > 09/04/19 23:23:04 INFO mapred.LocalJobRunner: reduce > reduce > > it waited for a long time. The final output lines were > > 09/04/19 23:24:12 INFO mapred.LocalJobRunner: reduce > reduce > 09/04/19 23:24:12 INFO mapred.TaskRunner: Task > 'attempt_local_0001_r_000000_0' done. > 09/04/19 23:24:12 INFO mapred.TaskRunner: Saved output of task > 'attempt_local_0001_r_000000_0' to hdfs://localhost/output > 09/04/19 23:24:13 INFO mapred.JobClient: Job complete: job_local_0001 > 09/04/19 23:24:13 INFO mapred.JobClient: Counters: 13 > 09/04/19 23:24:13 INFO mapred.JobClient: File Systems > 09/04/19 23:24:13 INFO mapred.JobClient: HDFS bytes read=138103444 > 09/04/19 23:24:13 INFO mapred.JobClient: HDFS bytes written=107357785 > 09/04/19 23:24:13 INFO mapred.JobClient: Local bytes read=282509133 > 09/04/19 23:24:13 INFO mapred.JobClient: Local bytes written=376697552 > 09/04/19 23:24:13 INFO mapred.JobClient: Map-Reduce Framework > 09/04/19 23:24:13 INFO mapred.JobClient: Reduce input groups=184 > 09/04/19 23:24:13 INFO mapred.JobClient: Combine output records=185 > 09/04/19 23:24:13 INFO mapred.JobClient: Map input records=209 > 09/04/19 23:24:13 INFO mapred.JobClient: Reduce output records=184 > 09/04/19 23:24:13 INFO mapred.JobClient: Map output bytes=91863989 > 09/04/19 23:24:13 INFO mapred.JobClient: Map input bytes=69051592 > 09/04/19 23:24:13 INFO mapred.JobClient: Combine input records=185 > 09/04/19 23:24:13 INFO mapred.JobClient: Map output records=209 > 09/04/19 23:24:13 INFO mapred.JobClient: Reduce input records=184 >
