after I set set mapred.min.split.size=200000000; Then it will kick off 3 map tasks (the file I have is 500M). So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off.
At 2011-08-25 19:38:30,"Daniel,Wu" <hadoop...@163.com> wrote: It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set set mapred.map.tasks=2; Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed Task Attempts map100.00% 900900 / 0 reduce100.00% 100100 / 0 At 2011-08-25 06:35:38,"Ashutosh Chauhan" <hashut...@apache.org> wrote: This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do: hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat 2011/8/24 Daniel,Wu<hadoop...@163.com> I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set mapred.map.tasks, such as 3, it doesn't work, as it always use 1 map task (please see the completed job information). Cluster Summary (Heap Size is 16.81 MB/966.69 MB) Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes 00630000664.0000 Completed Jobs JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00% 00100.00% 1 1NANA job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00% 11100.00% 1 1NANA job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00% 11100.00% 3 3NANA At 2011-08-24 18:19:38,wd <w...@wdicc.com> wrote: >What about your total Map Task Capacity? >you may check it from http://your_jobtracker:50030/jobtracker.jsp > >2011/8/24 Daniel,Wu <hadoop...@163.com>: >> I checked my setting, all are with the default value.So per the book of >> "Hadoop the definitive guide", the split size should be 64M. And the file >> size is about 500M, so that's about 8 splits. And from the map job >> information (after the map job is done), I can see it gets 8 split from one >> node. But anyhow it starts only one map task. >> >> >> >> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <vagg...@amazon.com> wrote: >> >> If you actually have splittable files you can set the following setting to >> create more splits: >> >> >> >> mapred.max.split.size appropriately. >> >> >> >> Thanks >> >> Vaibhav >> >> >> >> From: Daniel,Wu [mailto:hadoop...@163.com] >> Sent: Tuesday, August 23, 2011 6:51 AM >> To: hive >> Subject: Why a sql only use one map task? >> >> >> >> I run the following simple sql >> select count(*) from sales; >> And the job information shows it only uses one map task. >> >> The underlying hadoop has 3 data/data nodes. So I expect hive should kick >> off 3 map tasks, one on each task nodes. What can make hive only run one map >> task? Do I need to set something to kick off multiple map task? in my >> config, I didn't change hive config. >> >> >> >>