Thank you guys. I will try this later. And sorry for additional questions: if I do this, could the file become too big? Does hive have a config to control the max file size? Does hive can automatically split files?
On Thu, Nov 15, 2012 at 6:20 PM, Роман Павленко <pavlenko.roman....@gmail.com> wrote: > Example: > insert overwrite table my_table PARTITION (year=2012,month=9,day=4) select > `data`, `timestamp`, `hour`, `minute`, `second` from my_table WHERE > year=2012 AND month=9 AND day=4; > > > > > 2012/11/15 Bejoy KS <bejoy...@yahoo.com> >> >> Hi Chen >> >> You can do it in hive as well. Enable hive merge and Insert OverWrite the >> Partition once agin with Select *. >> >> Hive.merge.mapfiles=true. >> >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> >> -----Original Message----- >> From: "Bejoy KS" <bejoy...@yahoo.com> >> Date: Thu, 15 Nov 2012 08:10:12 >> To: <user@hive.apache.org> >> Reply-To: user@hive.apache.org >> Subject: Re: Can I merge files after I loaded them into hive? >> >> Hi chen >> >> You can use Flume for ingestion into hdfs . Flume takes care of the file >> sizes, combines the files and stores as one large file. This is a better >> approach. >> >> You can have custom MR jobs to merge these files in hdfs as well. Use >> combineFileInputFormat and start a map only job with Identity mapper with >> split size set to the required large file size. >> >> >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> >> -----Original Message----- >> From: Cheng Su <scarcer...@gmail.com> >> Date: Thu, 15 Nov 2012 16:03:44 >> To: <user@hive.apache.org> >> Reply-To: user@hive.apache.org >> Subject: Can I merge files after I loaded them into hive? >> >> Hi, all. >> >> Can I merge files after I loaded them into hive? >> This is my situation: >> >> There is a log table partitioned by date, which is store the nginx access >> logs. >> The raw log files are loaded into hive every hour. >> By now, a single log file size is small, say 10 MB or even smaller. >> So there are 24 small size files in one partition. >> This is ineffective in my opinion, and will consume more hadoop heap size. >> That's why I want to merge the small files. >> >> Can hive merge those files automatically? >> Or dose hive provide some tools to merge files? >> Or I can just use hadoop dfs -cat to do that? >> >> -- >> >> Regards, >> Cheng Su > > -- Regards, Cheng Su