Hello Abhishek , Unless you have modified conf/mapred-site.xml file MapReduce will use configuration values specified in $HADOOP_HOME/src/mapred/mapred-default.xml In this configuration file mapred.map.tasks is configured as 2. And due to this your job is running 2 map tasks.
<property> <name>mapred.map.tasks</name> <value>2</value> <description>The default number of map tasks per job. Ignored when mapred.job.tracker is "local". </description> </property> Hope this helps. - Ravi On 3/24/10 7:27 PM, "abhishek sharma" <absha...@usc.edu> wrote: I realized that I made a mistake in my earlier post. So here is the correct one. I have a job ("loadgen") with only 1 input (say) part-00000 of size 1368654 bytes. So when I submit this job, I get the following output: INFO mapred.FileInputFormat: Total input paths to process : 1 However, in the JobTracker log, I see the following entry: Split info for job:job_201003131110_0043 with 2 splits and subsequently 2 map tasks are started to process these two splits. The size of input splits to these 2 map tasks is 6843283. So the input is divided equally into two splits. My question is: Why are two map tasks created instead of one and why is the combined size of the two splits greater than the size of my input? I also noticed that if I run the same job with 2 inputs (say) part-00000 and part-00001, then only 2 map tasks are created. To my knowledge, the number of map tasks should be the same as the number of inputs. Thanks, Ravi --