Hi guys, Thanks for the response. I tried running without hive.mergejob.maponly with the same result. I've attached the explain extended output. I am running this query on EC2 boxes, however it's not running on EMR. Hive is running on top of a hadoop 0.20.2 setup..
Thanks, Sammy On Fri, Oct 15, 2010 at 5:58 PM, Ning Zhang <nzh...@facebook.com> wrote: > The output file shows it only have 2 jobs (the mapreduce job and the move > task). This indicates that the plan does not have merge enabled. Merge should > consists of a ConditionalTask and 2 sub tasks (a MR task and a move task). > Can you send the plan of the query? > > One thing I noticed is that your are using Amazon EMR. I'm not sure if this > is enabled since SET hive.mergejob.maponly=true requires > CombineHiveInputFormat (only available in Hadoop 0.20 and someone reported > some distribution of Hadoop doesn't support that). So additional thing you > can try is to remove this setting. > > On Oct 15, 2010, at 1:43 PM, Sammy Yu wrote: > >> Hi, >> I have a dynamic partition query which generates quite a few small >> files which I would like to merge: >> >> SET hive.exec.dynamic.partition.mode=nonstrict; >> SET hive.exec.dynamic.partition=true; >> SET hive.exec.compress.output=true; >> SET io.seqfile.compression.type=BLOCK; >> SET hive.merge.size.per.task=256000000; >> SET hive.merge.smallfiles.avgsize=16000000000; >> SET hive.merge.mapfiles=true; >> SET hive.merge.mapredfiles=true; >> SET hive.mergejob.maponly=true; >> INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table >> PARTITION(org_id, day) >> SELECT session_id, permanent_id, first_date, last_date, week, month, quarter, >> referral_type, search_engine, us_search_engine, >> keyword, unnormalized_keyword, branded, conversion_meet, goals_meet, >> pages_viewed, >> entry_page, page_types, >> org_id, day >> FROM daily_conversions_without_rank_table; >> >> I am running the latest version from trunk with HIVE-1622, but it >> seems like I just can't get the post merge process to happen. I have >> raised hive.merge.smallfiles.avgsize. I'm wondering if the filtering >> at runtime is causing the merge process to be skipped. Attached are >> the hive output and log files. >> >> >> Thanks, >> Sammy >> <hive_output.txt><hive_job_log_root_201010151114_2037492391.txt> > > -- Chief Architect, BrightEdge email: s...@brightedge.com | mobile: 650.539.4867 | fax: 650.521.9678 | address: 1850 Gateway Dr Suite 400, San Mateo, CA 94404
explain.log
Description: Binary data