Re: Merging small files with dynamic partitions

Sammy Yu Fri, 15 Oct 2010 22:51:00 -0700

Hi guys,
   Thanks for the response.   I tried running without
hive.mergejob.maponly with the same result.  I've attached the explain
extended output.  I am running this query on EC2 boxes, however it's
not running on EMR.  Hive is running on top of a hadoop 0.20.2 setup..


Thanks,
Sammy

On Fri, Oct 15, 2010 at 5:58 PM, Ning Zhang <nzh...@facebook.com> wrote:
> The output file shows it only have 2 jobs (the mapreduce job and the move 
> task). This indicates that the plan does not have merge enabled. Merge should 
> consists of a ConditionalTask and 2 sub tasks (a MR task and a move task). 
> Can you send the plan of the query?
>
> One thing I noticed is that your are using Amazon EMR. I'm not sure if this 
> is enabled since SET hive.mergejob.maponly=true requires 
> CombineHiveInputFormat (only available in Hadoop 0.20 and someone reported 
> some distribution of Hadoop doesn't support that). So additional thing you 
> can try is to remove this setting.
>
> On Oct 15, 2010, at 1:43 PM, Sammy Yu wrote:
>
>> Hi,
>>  I have a dynamic partition query which generates quite a few small
>> files which I would like to merge:
>>
>> SET hive.exec.dynamic.partition.mode=nonstrict;
>> SET hive.exec.dynamic.partition=true;
>> SET hive.exec.compress.output=true;
>> SET io.seqfile.compression.type=BLOCK;
>> SET hive.merge.size.per.task=256000000;
>> SET hive.merge.smallfiles.avgsize=16000000000;
>> SET hive.merge.mapfiles=true;
>> SET hive.merge.mapredfiles=true;
>> SET hive.mergejob.maponly=true;
>> INSERT OVERWRITE TABLE daily_conversions_without_rank_all_table
>> PARTITION(org_id, day)
>> SELECT session_id, permanent_id, first_date, last_date, week, month, quarter,
>> referral_type, search_engine, us_search_engine,
>> keyword, unnormalized_keyword, branded, conversion_meet, goals_meet,
>> pages_viewed,
>> entry_page, page_types,
>> org_id, day
>> FROM daily_conversions_without_rank_table;
>>
>> I am running the latest version from trunk with HIVE-1622, but it
>> seems like I just can't get the post merge process to happen. I have
>> raised hive.merge.smallfiles.avgsize.  I'm wondering if the filtering
>> at runtime is causing the merge process to be skipped.  Attached are
>> the hive output and log files.
>>
>>
>> Thanks,
>> Sammy
>> <hive_output.txt><hive_job_log_root_201010151114_2037492391.txt>
>
>



-- 
Chief Architect, BrightEdge
email: s...@brightedge.com   |   mobile: 650.539.4867  |   fax:
650.521.9678  |  address: 1850 Gateway Dr Suite 400, San Mateo, CA
94404

explain.log
Description: Binary data

Re: Merging small files with dynamic partitions

Reply via email to