Hi Chen You can do it in hive as well. Enable hive merge and Insert OverWrite the Partition once agin with Select *.
Hive.merge.mapfiles=true. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: "Bejoy KS" <bejoy...@yahoo.com> Date: Thu, 15 Nov 2012 08:10:12 To: <user@hive.apache.org> Reply-To: user@hive.apache.org Subject: Re: Can I merge files after I loaded them into hive? Hi chen You can use Flume for ingestion into hdfs . Flume takes care of the file sizes, combines the files and stores as one large file. This is a better approach. You can have custom MR jobs to merge these files in hdfs as well. Use combineFileInputFormat and start a map only job with Identity mapper with split size set to the required large file size. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Cheng Su <scarcer...@gmail.com> Date: Thu, 15 Nov 2012 16:03:44 To: <user@hive.apache.org> Reply-To: user@hive.apache.org Subject: Can I merge files after I loaded them into hive? Hi, all. Can I merge files after I loaded them into hive? This is my situation: There is a log table partitioned by date, which is store the nginx access logs. The raw log files are loaded into hive every hour. By now, a single log file size is small, say 10 MB or even smaller. So there are 24 small size files in one partition. This is ineffective in my opinion, and will consume more hadoop heap size. That's why I want to merge the small files. Can hive merge those files automatically? Or dose hive provide some tools to merge files? Or I can just use hadoop dfs -cat to do that? -- Regards, Cheng Su