Re: Merging small files with dynamic partitions

Edward Capriolo Fri, 15 Oct 2010 17:11:54 -0700

Sammy,

This is not the exact remedy you were looking for, but my company open
sourced our file crusher utility.


http://www.jointhegrid.com/hadoop_filecrush/index.jsp

We use it to good effect to turn many small files into one. Works with
text and sequence files , and custom writables.

Edward
On Friday, October 15, 2010, Sammy Yu <s...@brightedge.com> wrote:
> Hi,
>   I have a dynamic partition query which generates quite a few small
> files which I would like to merge:
>
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.dynamic.partition=true;
> SET hive.exec.compress.output=true;
> SET io.seqfile.compression.type=BLOCK;
> SET hive.merge.size.per.task=256000000;
> SET hive.merge.smallfiles.avgsize=16000000000;
> SET hive.merge.mapfiles=true;
> SET hive.merge.mapredfiles=true;
> SET hive.mergejob.maponly=true;
> INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table
> PARTITION(org_id, day)
> SELECT session_id, permanent_id, first_date, last_date, week, month, quarter,
> referral_type, search_engine, us_search_engine,
> keyword, unnormalized_keyword, branded, conversion_meet, goals_meet,
> pages_viewed,
> entry_page, page_types,
> org_id, day
> FROM daily_conversions_without_rank_table;
>
> I am running the latest version from trunk with HIVE-1622, but it
> seems like I just can't get the post merge process to happen. I have
> raised hive.merge.smallfiles.avgsize.  I'm wondering if the filtering
> at runtime is causing the merge process to be skipped.  Attached are
> the hive output and log files.
>
>
> Thanks,
> Sammy
>

Re: Merging small files with dynamic partitions

Reply via email to