Re: large small files vs one big file in hive table

Db-Blog Mon, 05 May 2014 15:40:32 -0700

In general it is recommended to have Millions of Large files rather than 
billions of small files in hadoop.


Please describe your issues in detail. Say for ex. 
-How are you planning to consume the data stored in this partition table?
- Are you looking for storage and performance optimizations? Etc. 

Thanks
Saurabh

Sent from my iPhone, please avoid typos.

> On 05-May-2014, at 3:33 pm, Shushant Arora <shushantaror...@gmail.com> wrote:
> 
> I have a hive table in which data is populated from RDBMS on daily basis.
> 
> After map reduce each mapper write its data in hive table partitioned at 
> month level.
> Issue is daily when job runs it fetches data of last day and each mapper 
> writes its output in seperate file. Shall I merge those files in single one ?
> 
> What should be file format? Sequence file or text is better ?
> 
>

Re: large small files vs one big file in hive table

Reply via email to