Thank you guys.
I will try this later.
And sorry for additional questions:
if I do this, could the file become too big? Does hive have a config
to control the max file size? Does hive can automatically split files?

On Thu, Nov 15, 2012 at 6:20 PM, Роман Павленко
<pavlenko.roman....@gmail.com> wrote:
> Example:
> insert overwrite table my_table PARTITION (year=2012,month=9,day=4) select
> `data`, `timestamp`, `hour`, `minute`, `second`  from my_table WHERE
> year=2012 AND month=9 AND day=4;
>
>
>
>
> 2012/11/15 Bejoy KS <bejoy...@yahoo.com>
>>
>> Hi Chen
>>
>> You can do it in hive as well. Enable hive merge and Insert OverWrite the
>> Partition once agin with Select *.
>>
>> Hive.merge.mapfiles=true.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: "Bejoy KS" <bejoy...@yahoo.com>
>> Date: Thu, 15 Nov 2012 08:10:12
>> To: <user@hive.apache.org>
>> Reply-To: user@hive.apache.org
>> Subject: Re: Can I merge files after I loaded them into hive?
>>
>> Hi chen
>>
>> You can use Flume for ingestion into hdfs . Flume takes care of the file
>> sizes, combines the files and stores as one large file. This is a better
>> approach.
>>
>> You can have custom MR jobs to merge these files in hdfs as well. Use
>> combineFileInputFormat and start a map only job with Identity mapper with
>> split size set to the required large file size.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>
>> -----Original Message-----
>> From: Cheng Su <scarcer...@gmail.com>
>> Date: Thu, 15 Nov 2012 16:03:44
>> To: <user@hive.apache.org>
>> Reply-To: user@hive.apache.org
>> Subject: Can I merge files after I loaded them into hive?
>>
>> Hi, all.
>>
>> Can I merge files after I loaded them into hive?
>> This is my situation:
>>
>> There is a log table partitioned by date, which is store the nginx access
>> logs.
>> The raw log files are loaded into hive every hour.
>> By now, a single log file size is small, say 10 MB or even smaller.
>> So there are 24 small size files in one partition.
>> This is ineffective in my opinion, and will consume more hadoop heap size.
>> That's why I want to merge the small files.
>>
>> Can hive merge those files automatically?
>> Or dose hive provide some tools to merge files?
>> Or I can just use hadoop dfs -cat to do that?
>>
>> --
>>
>> Regards,
>> Cheng Su
>
>



-- 

Regards,
Cheng Su

Reply via email to