In general it is recommended to have Millions of Large files rather than billions of small files in hadoop.
Please describe your issues in detail. Say for ex. -How are you planning to consume the data stored in this partition table? - Are you looking for storage and performance optimizations? Etc. Thanks Saurabh Sent from my iPhone, please avoid typos. > On 05-May-2014, at 3:33 pm, Shushant Arora <shushantaror...@gmail.com> wrote: > > I have a hive table in which data is populated from RDBMS on daily basis. > > After map reduce each mapper write its data in hive table partitioned at > month level. > Issue is daily when job runs it fetches data of last day and each mapper > writes its output in seperate file. Shall I merge those files in single one ? > > What should be file format? Sequence file or text is better ? > >