I always set it, so am not sure what the behavior is if it is not set. You 
should probably always set it. See the comments/code in 
CombineFileInputFormat.java for detail.


From: Junxian Yan [mailto:junxian....@gmail.com]
Sent: Wednesday, June 01, 2011 7:54 PM
To: Steven Wong; user@hive.apache.org
Subject: Re: Dose block size determine the number of map task

Thx. So that means hadoop will treat conbinehiveinput as one block if not set 
split paramters, is it right?

R
On Wed, Jun 1, 2011 at 6:44 PM, Steven Wong 
<sw...@netflix.com<mailto:sw...@netflix.com>> wrote:
When using CombineHiveInputFormat, parameters such as mapred.max.split.size 
(and others) help determine how the input is split across mappers. Other 
factors include whether your input files' format is a splittable format or not.

Hope this helps.


From: Junxian Yan [mailto:junxian....@gmail.com<mailto:junxian....@gmail.com>]
Sent: Wednesday, June 01, 2011 12:45 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Dose block size determine the number of map task

I saw this in hadoop wiki: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

But in my experiment,I see the different result. When I set the 
CombineHiveInputFormat in hive and by the doc, the default block should be 64M, 
but my input files are more than 64M, hadoop still created one map task to 
handle all data.

Can you help to figure out where is wrong?

R

Reply via email to