I didn't find a configuration property specifically for controlling the merged input size for a mapper except those for map or reduce output size. e.g. "hive.merge.size.smallfiles.avgsize" looks like what I want, but it actually applies to output. In pig, " pig.maxCombinedSplitSize" does the similar job. Is there a similar setting in Hive?
Thanks! On Fri, Apr 8, 2011 at 12:18 PM, V.Senthil Kumar <vaisen2...@yahoo.com>wrote: > > > You can find other related configuration parameters here > http://wiki.apache.org/hadoop/Hive/AdminManual/Configuration > I think you can set the file sizes. I haven't tried it but i guess its > there in this page. > > ------------------------------ > *From:* Michael Jiang <it.mjji...@gmail.com> > *To:* user@hive.apache.org > *Cc:* V.Senthil Kumar <vaisen2...@yahoo.com> > *Sent:* Fri, April 8, 2011 11:56:21 AM > *Subject:* Re: How to configure Hive to use CombineFileInputFormat in case > of too many small files > > Thanks Kumar. > > Are there other settings to fine tune how small files are merged into a > bigger one that a mapper takes? Basically I want to match the size of a > merged file to the block size. > > > > On Fri, Apr 8, 2011 at 11:43 AM, V.Senthil Kumar <vaisen2...@yahoo.com>wrote: > >> You can add these lines in hive-site.xml. It creates only one file at the >> end. Hope it helps. >> >> <property> >> <name>hive.merge.mapredfiles</name> >> <value>true</value> >> <description>Merge small files at the end of a map-reduce >> job</description> >> </property> >> >> <property> >> <name>hive.input.format</name> >> <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value> >> <description>The default input format, if it is not specified, the >> system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 >> and 19, whereas it is set to CombineHiveInputFormat for hadoop 20. The user >> can always overwrite it - if there is a bug in CombineHiveInputFormat, it >> can always be manually set to HiveInputFormat. </description> >> </property> >> >> >> >> ------------------------------ >> *From:* Michael Jiang <it.mjji...@gmail.com> >> *To:* user@hive.apache.org >> *Sent:* Fri, April 8, 2011 11:34:58 AM >> *Subject:* How to configure Hive to use CombineFileInputFormat in case of >> too many small files >> >> Could not find the instructions regarding this to avoid performance issues >> when too many mappers have to be created for every small file. Thanks! >> > >