Re:Re:Re: Re: RE: Why a sql only use one map task?

Daniel,Wu Thu, 25 Aug 2011 05:03:24 -0700

after I set
set mapred.min.split.size=200000000;

Then it will kick off 3 map tasks (the file I have is 500M).  So looks like we 
need to set mapred.min.split.size instead of mapred.map.tasks to control how 
many maps to kick off.



At 2011-08-25 19:38:30,"Daniel,Wu" <hadoop...@163.com> wrote:

It works, after I set as you said, but looks like I can't control the map task, 
it always use 9 maps, even if I set
set mapred.map.tasks=2;


Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%


900900 / 0
reduce100.00%


100100 / 0



At 2011-08-25 06:35:38,"Ashutosh Chauhan" <hashut...@apache.org> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map 
task. If you don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat


2011/8/24 Daniel,Wu<hadoop...@163.com>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I 
set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map task 
(please see the completed job information).



Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map 
SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task 
CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes
00630000664.0000


Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % 
CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%


00100.00%


1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) 
from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) 
from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) 
from...period_key(Stage-1)100.00%


11100.00%


3 3NANA



At 2011-08-24 18:19:38,wd <w...@wdicc.com> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>
>2011/8/24 Daniel,Wu <hadoop...@163.com>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <vagg...@amazon.com> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:hadoop...@163.com]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;
>> And the job information shows it only uses one map task.
>>
>> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
>> off 3 map tasks, one on each task nodes. What can make hive only run one map
>> task? Do I need to set something to kick off multiple map task?  in my
>> config, I didn't change hive config.
>>
>>
>>
>>

Re:Re:Re: Re: RE: Why a sql only use one map task?

Reply via email to