Hi, has there been any resolution to this? I'm having the same trouble. With Hive 0.6 and Hadoop 0.18 and a dynamic partition insert, hive.merge.mapredfiles doesn't work. It works fine for a static partition insert. What I'm seeing is that even when I set hive.merge.mapredfiles=true, the jobconf has it as false for the dynamic partition insert.
I was reading https://issues.apache.org/jira/browse/HIVE-1307 and it looks like maybe Hadoop 0.20 is required for this? Thanks, On Sat, Oct 16, 2010 at 1:50 AM, Sammy Yu <s...@brightedge.com> wrote: > Hi guys, > Thanks for the response. I tried running without > hive.mergejob.maponly with the same result. I've attached the explain > extended output. I am running this query on EC2 boxes, however it's > not running on EMR. Hive is running on top of a hadoop 0.20.2 setup.. > > Thanks, > Sammy > > On Fri, Oct 15, 2010 at 5:58 PM, Ning Zhang <nzh...@facebook.com> wrote: > > The output file shows it only have 2 jobs (the mapreduce job and the move > task). This indicates that the plan does not have merge enabled. Merge > should consists of a ConditionalTask and 2 sub tasks (a MR task and a move > task). Can you send the plan of the query? > > > > One thing I noticed is that your are using Amazon EMR. I'm not sure if > this is enabled since SET hive.mergejob.maponly=true requires > CombineHiveInputFormat (only available in Hadoop 0.20 and someone reported > some distribution of Hadoop doesn't support that). So additional thing you > can try is to remove this setting. > > > > On Oct 15, 2010, at 1:43 PM, Sammy Yu wrote: > > > >> Hi, > >> I have a dynamic partition query which generates quite a few small > >> files which I would like to merge: > >> > >> SET hive.exec.dynamic.partition.mode=nonstrict; > >> SET hive.exec.dynamic.partition=true; > >> SET hive.exec.compress.output=true; > >> SET io.seqfile.compression.type=BLOCK; > >> SET hive.merge.size.per.task=256000000; > >> SET hive.merge.smallfiles.avgsize=16000000000; > >> SET hive.merge.mapfiles=true; > >> SET hive.merge.mapredfiles=true; > >> SET hive.mergejob.maponly=true; > >> INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table > >> PARTITION(org_id, day) > >> SELECT session_id, permanent_id, first_date, last_date, week, month, > quarter, > >> referral_type, search_engine, us_search_engine, > >> keyword, unnormalized_keyword, branded, conversion_meet, goals_meet, > >> pages_viewed, > >> entry_page, page_types, > >> org_id, day > >> FROM daily_conversions_without_rank_table; > >> > >> I am running the latest version from trunk with HIVE-1622, but it > >> seems like I just can't get the post merge process to happen. I have > >> raised hive.merge.smallfiles.avgsize. I'm wondering if the filtering > >> at runtime is causing the merge process to be skipped. Attached are > >> the hive output and log files. > >> > >> > >> Thanks, > >> Sammy > >> <hive_output.txt><hive_job_log_root_201010151114_2037492391.txt> > > > > > > > > -- > Chief Architect, BrightEdge > email: s...@brightedge.com | mobile: 650.539.4867 | fax: > 650.521.9678 | address: 1850 Gateway Dr Suite 400, San Mateo, CA > 94404 > -- Dave Brondsema Software Engineer Geeknet www.geek.net