Re: Small files under SequenceFile table partition directories

2015-11-11 Thread Chetna C
You can get rid of processing overhead by using splittable compression for your hive table data, something like 4mc . Or you can use hadoop's getmerge utility to merge small files periodically. Thanks, Chetna Chaudhari On 11 November 2015 at 10:56, reveen joe w

Small files under SequenceFile table partition directories

2015-11-10 Thread reveen joe
Hi, Most of our Hive tables are SequenceFile tables and there are currently many small file ranging from *1-4 MB* under the Partition directories (created by insert-overwrite). I am assuming this is due to 2 reasons 1. Some of our tables are Bucketed and so individual files are created for each b