You can add these lines in hive-site.xml. It creates only one file at the end. 
Hope it helps.

<property>
  <name>hive.merge.mapredfiles</name>
  <value>true</value>
  <description>Merge small files at the end of a map-reduce job</description>
</property>

<property>
  <name>hive.input.format</name>
  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
  <description>The default input format, if it is not specified, the system 
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, 
whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always 
overwrite it - if there is a bug in CombineHiveInputFormat, it can always be 
manually set to HiveInputFormat. </description>
</property>






________________________________
From: Michael Jiang <it.mjji...@gmail.com>
To: user@hive.apache.org
Sent: Fri, April 8, 2011 11:34:58 AM
Subject: How to configure Hive to use CombineFileInputFormat in case of too 
many 
small files

Could not find the instructions regarding this to avoid performance issues when 
too many mappers have to be created for every small file. Thanks!

Reply via email to