Re: Issue on using hive Dynamic Partitions on larger tables

Bejoy Ks Mon, 20 Jun 2011 07:59:07 -0700

Thanks Steven. Now I'm out of that bug, but another one pops when I'm trying 
for 
Dynamic partitions with larger tables. I have implemenetd the same on smaller 
tables using the same approach mentioned below, but some how it fails for 
larger 
tables.


My Larger source Table(parameter_def) contains 5 billion rows which I have 
SQOOPed into hive from a DWH and when I try implementing the dynamic partition 
on the same with the Query
INSERT OVERWRITE TABLE parameter_part PARTITION(location) 
SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
p.del_date,p.location FROM parameter_def p;
 There are 2 map reduce jobs triggered and the first one now runs to completion 
after setting 

hive.exec.max.created.files=150000;
But the second job just fails as such without even running. Given below is the 
error log
From putty console
2011-06-20 10:40:13,348 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201106061630_0937
Ended Job = 1659539584, job is filtered out (removed at runtime).
Launching Job 2 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201106061630_0938, Tracking URL = 
http://********.com:50030/jobdetails.jsp?jobid=job_201106061630_0938
Kill Command = /usr/lib/hadoop/bin/hadoop job  
-Dmapred.job.tracker=********.com:8021 -kill job_201106061630_0938
2011-06-20 10:42:51,914 Stage-3 map = 100%,  reduce = 100%
Ended Job = job_201106061630_0938 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask

From hive log file
2011-06-20 10:41:02,293 WARN  mapred.JobClient 
(JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
2011-06-20 10:42:51,917 ERROR exec.MapRedTask 
(SessionState.java:printError(343)) - Ended Job = job_201106061630_0938 with 
errors
2011-06-20 10:42:51,938 ERROR ql.Driver (SessionState.java:printError(343)) - 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask


The hadoop and hive version I'm using are as follows
Hadoop Version - Hadoop 0.20.2-cdh3u0
Hive Version - Hive 0.7(lib/hive-hwi-0.7.0-cdh3u0.war)

Please help me out in figuring what is going wrong with my implementation. 

Thank You

Regards
Bejoy.K.S





________________________________
From: Steven Wong <sw...@netflix.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Sent: Sat, June 18, 2011 6:54:34 AM
Subject: RE: Issue on using hive Dynamic Partitions on larger tables


The name of the parameter is actually hive.exec.max.created.files. The wiki has 
a typo, which I’ll fix.
 
 
From:Bejoy Ks [mailto:bejoy...@yahoo.com] 
Sent: Thursday, June 16, 2011 9:35 AM
To: hive user group
Subject: Issue on using hive Dynamic Partitions on larger tables
 
Hi Hive Experts
    I'm facing an issue while using hive Dynamic Partitions on larger tables. I 
tried out  Dynamic partitions on smaller tables and it was working fine but 
unfortunately when i tried the same on a larger table the map reduce job 
terminates throwing an error as

2011-06-16 12:14:28,592 Stage-1 map = 74%,  reduce = 0%
[Fatal Error] total number of created files exceeds 100000. Killing the job.
Ended Job = job_201106061630_0536 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask

I tried setting the parameter hive.max.created.files to a larger value, still 
the same error
hive>set hive.max.created.files=500000;
The same error was thrown 'total number of created files exceeds 100000' even 
after I changed the value to 500000. I doubt whether the value is set for the 
config parameter is not getting affected. Or am I setting the wrong parameter 
to 
solve this issue. Please advise

The other parameters I did set on hive CLI for dynamic partitions are
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=300;

The hive QL query I used for dynamic partition  is
INSERT OVERWRITE TABLE parameter_part PARTITION(location) 
SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
p.del_date,p.location FROM parameter_def p;

Please help me out in resolving the same

Thank You.

Regards
Bejoy.K.S

Re: Issue on using hive Dynamic Partitions on larger tables

Reply via email to