I always set it, so am not sure what the behavior is if it is not set. You should probably always set it. See the comments/code in CombineFileInputFormat.java for detail.
From: Junxian Yan [mailto:junxian....@gmail.com] Sent: Wednesday, June 01, 2011 7:54 PM To: Steven Wong; user@hive.apache.org Subject: Re: Dose block size determine the number of map task Thx. So that means hadoop will treat conbinehiveinput as one block if not set split paramters, is it right? R On Wed, Jun 1, 2011 at 6:44 PM, Steven Wong <sw...@netflix.com<mailto:sw...@netflix.com>> wrote: When using CombineHiveInputFormat, parameters such as mapred.max.split.size (and others) help determine how the input is split across mappers. Other factors include whether your input files' format is a splittable format or not. Hope this helps. From: Junxian Yan [mailto:junxian....@gmail.com<mailto:junxian....@gmail.com>] Sent: Wednesday, June 01, 2011 12:45 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Dose block size determine the number of map task I saw this in hadoop wiki: http://wiki.apache.org/hadoop/HowManyMapsAndReduces But in my experiment,I see the different result. When I set the CombineHiveInputFormat in hive and by the doc, the default block should be 64M, but my input files are more than 64M, hadoop still created one map task to handle all data. Can you help to figure out where is wrong? R