The output file shows it only have 2 jobs (the mapreduce job and the move 
task). This indicates that the plan does not have merge enabled. Merge should 
consists of a ConditionalTask and 2 sub tasks (a MR task and a move task). Can 
you send the plan of the query? 

One thing I noticed is that your are using Amazon EMR. I'm not sure if this is 
enabled since SET hive.mergejob.maponly=true requires CombineHiveInputFormat 
(only available in Hadoop 0.20 and someone reported some distribution of Hadoop 
doesn't support that). So additional thing you can try is to remove this 
setting.

On Oct 15, 2010, at 1:43 PM, Sammy Yu wrote:

> Hi,
>  I have a dynamic partition query which generates quite a few small
> files which I would like to merge:
> 
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.dynamic.partition=true;
> SET hive.exec.compress.output=true;
> SET io.seqfile.compression.type=BLOCK;
> SET hive.merge.size.per.task=256000000;
> SET hive.merge.smallfiles.avgsize=16000000000;
> SET hive.merge.mapfiles=true;
> SET hive.merge.mapredfiles=true;
> SET hive.mergejob.maponly=true;
> INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table
> PARTITION(org_id, day)
> SELECT session_id, permanent_id, first_date, last_date, week, month, quarter,
> referral_type, search_engine, us_search_engine,
> keyword, unnormalized_keyword, branded, conversion_meet, goals_meet,
> pages_viewed,
> entry_page, page_types,
> org_id, day
> FROM daily_conversions_without_rank_table;
> 
> I am running the latest version from trunk with HIVE-1622, but it
> seems like I just can't get the post merge process to happen. I have
> raised hive.merge.smallfiles.avgsize.  I'm wondering if the filtering
> at runtime is causing the merge process to be skipped.  Attached are
> the hive output and log files.
> 
> 
> Thanks,
> Sammy
> <hive_output.txt><hive_job_log_root_201010151114_2037492391.txt>

Reply via email to