Sammy, This is not the exact remedy you were looking for, but my company open sourced our file crusher utility.
http://www.jointhegrid.com/hadoop_filecrush/index.jsp We use it to good effect to turn many small files into one. Works with text and sequence files , and custom writables. Edward On Friday, October 15, 2010, Sammy Yu <s...@brightedge.com> wrote: > Hi, > I have a dynamic partition query which generates quite a few small > files which I would like to merge: > > SET hive.exec.dynamic.partition.mode=nonstrict; > SET hive.exec.dynamic.partition=true; > SET hive.exec.compress.output=true; > SET io.seqfile.compression.type=BLOCK; > SET hive.merge.size.per.task=256000000; > SET hive.merge.smallfiles.avgsize=16000000000; > SET hive.merge.mapfiles=true; > SET hive.merge.mapredfiles=true; > SET hive.mergejob.maponly=true; > INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table > PARTITION(org_id, day) > SELECT session_id, permanent_id, first_date, last_date, week, month, quarter, > referral_type, search_engine, us_search_engine, > keyword, unnormalized_keyword, branded, conversion_meet, goals_meet, > pages_viewed, > entry_page, page_types, > org_id, day > FROM daily_conversions_without_rank_table; > > I am running the latest version from trunk with HIVE-1622, but it > seems like I just can't get the post merge process to happen. I have > raised hive.merge.smallfiles.avgsize. I'm wondering if the filtering > at runtime is causing the merge process to be skipped. Attached are > the hive output and log files. > > > Thanks, > Sammy >