I didn't find a configuration property specifically for controlling the
merged input size for a mapper except those for map or reduce output size.
e.g. "hive.merge.size.smallfiles.avgsize" looks like what I want, but it
actually applies to output. In pig, " pig.maxCombinedSplitSize" does the
similar job. Is there a similar setting in Hive?

Thanks!

On Fri, Apr 8, 2011 at 12:18 PM, V.Senthil Kumar <vaisen2...@yahoo.com>wrote:

>
>
> You can find other related configuration parameters here
> http://wiki.apache.org/hadoop/Hive/AdminManual/Configuration
> I think you can set the file sizes. I haven't tried it but i guess its
> there in this page.
>
> ------------------------------
> *From:* Michael Jiang <it.mjji...@gmail.com>
> *To:* user@hive.apache.org
> *Cc:* V.Senthil Kumar <vaisen2...@yahoo.com>
> *Sent:* Fri, April 8, 2011 11:56:21 AM
> *Subject:* Re: How to configure Hive to use CombineFileInputFormat in case
> of too many small files
>
> Thanks Kumar.
>
> Are there other settings to fine tune how small files are merged into a
> bigger one that a mapper takes? Basically I want to match the size of a
> merged file to the block size.
>
>
>
> On Fri, Apr 8, 2011 at 11:43 AM, V.Senthil Kumar <vaisen2...@yahoo.com>wrote:
>
>> You can add these lines in hive-site.xml. It creates only one file at the
>> end. Hope it helps.
>>
>> <property>
>>   <name>hive.merge.mapredfiles</name>
>>   <value>true</value>
>>   <description>Merge small files at the end of a map-reduce
>> job</description>
>> </property>
>>
>> <property>
>>   <name>hive.input.format</name>
>>   <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>>   <description>The default input format, if it is not specified, the
>> system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18
>> and 19, whereas it is set to CombineHiveInputFormat for hadoop 20. The user
>> can always overwrite it - if there is a bug in CombineHiveInputFormat, it
>> can always be manually set to HiveInputFormat. </description>
>> </property>
>>
>>
>>
>> ------------------------------
>> *From:* Michael Jiang <it.mjji...@gmail.com>
>> *To:* user@hive.apache.org
>> *Sent:* Fri, April 8, 2011 11:34:58 AM
>> *Subject:* How to configure Hive to use CombineFileInputFormat in case of
>> too many small files
>>
>> Could not find the instructions regarding this to avoid performance issues
>> when too many mappers have to be created for every small file. Thanks!
>>
>
>

Reply via email to