Thanks for the suggestion. The query created just one result file.  

Also, before trying this query, I have found out another way of making this 
work. I have added the following properties in hive-site.xml and it worked as 
well. It created just one result file. 


<property>
  <name>hive.merge.mapredfiles</name>
  <value>true</value>
  <description>Merge small files at the end of a map-reduce job</description>
</property>

<property>
  <name>hive.input.format</name>
  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
  <description>The default input format, if it is not specified, the system 
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, 
whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always 
overwrite it - if there is a bug in CombineHiveInputFormat, it can always be 
manually set to HiveInputFormat. </description>
</property>



----- Original Message ----
From: Jov <zhao6...@gmail.com>
To: user@hive.apache.org
Sent: Tue, March 29, 2011 10:22:32 PM
Subject: Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

try add limit:

INSERT OVERWRITE LOCAL DIRECTORY
'/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
Select host, identity, user, time, request
from raw_apachelog
where ds = '2011-03-22-001500' limit 32;


2011/3/30 V.Senthil Kumar <vaisen2...@yahoo.com>:
> Hello,
>
> I have a hive query which does a simple select and writes the results to a 
>local
>
> file system.
>
>
> For example, a query like this,
>
> INSERT OVERWRITE LOCAL DIRECTORY
> '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
> Select host, identity, user, time, request
> from raw_apachelog
> where ds = '2011-03-22-001500';
>
> Now this creates a two files under apachetest folder. This table has only 32
> rows. Is there any way I can make Hive to create only single file ?
>
>
> Appreciate your help :)
>
> Thanks,
> Senthil
>

Reply via email to