You can add these lines in hive-site.xml. It creates only one file at the end. Hope it helps.
<property> <name>hive.merge.mapredfiles</name> <value>true</value> <description>Merge small files at the end of a map-reduce job</description> </property> <property> <name>hive.input.format</name> <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value> <description>The default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombineHiveInputFormat, it can always be manually set to HiveInputFormat. </description> </property> ________________________________ From: Michael Jiang <it.mjji...@gmail.com> To: user@hive.apache.org Sent: Fri, April 8, 2011 11:34:58 AM Subject: How to configure Hive to use CombineFileInputFormat in case of too many small files Could not find the instructions regarding this to avoid performance issues when too many mappers have to be created for every small file. Thanks!