Thanks Kumar. Are there other settings to fine tune how small files are merged into a bigger one that a mapper takes? Basically I want to match the size of a merged file to the block size.
On Fri, Apr 8, 2011 at 11:43 AM, V.Senthil Kumar <vaisen2...@yahoo.com>wrote: > You can add these lines in hive-site.xml. It creates only one file at the > end. Hope it helps. > > <property> > <name>hive.merge.mapredfiles</name> > <value>true</value> > <description>Merge small files at the end of a map-reduce > job</description> > </property> > > <property> > <name>hive.input.format</name> > <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value> > <description>The default input format, if it is not specified, the system > assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, > whereas it is set to CombineHiveInputFormat for hadoop 20. The user can > always overwrite it - if there is a bug in CombineHiveInputFormat, it can > always be manually set to HiveInputFormat. </description> > </property> > > > > ------------------------------ > *From:* Michael Jiang <it.mjji...@gmail.com> > *To:* user@hive.apache.org > *Sent:* Fri, April 8, 2011 11:34:58 AM > *Subject:* How to configure Hive to use CombineFileInputFormat in case of > too many small files > > Could not find the instructions regarding this to avoid performance issues > when too many mappers have to be created for every small file. Thanks! >