Hi,

I'm trying to run a program on Hadoop.

[Input] tsv file

My program does the following.
(1) Load tsv into hive
      load data local inpath 'tsvfile' overwrite into table A partitioned by xx
(2) insert overwrite table B select a, b, c from table A where 
datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))), request_date) <= 
30
(3) Running Mahout

In step 2, i am trying to retrieve data from hive for the past month.
My hadoop work always stopped here.
When i check through my browser utility it says that 

Diagnostic Info:
# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: 
task_201211291541_0262_m_001800

Task attempt_201211291541_0262_m_001800_0 failed to report status for 1802 
seconds. Killing!
Error: Java heap space
Task attempt_201211291541_0262_m_001800_2 failed to report status for 1800 
seconds. Killing!
Task attempt_201211291541_0262_m_001800_3 failed to report status for 1801 
seconds. Killing!



Each hive table is big, around 6 GB.

(1) Is it too big to have around 6GB for each hive table?
(2) I've increased by HEAPSIZE to 50G,which i think is far more than enough. 
Any else
where i can do the tuning?


Thank you.



rei


Reply via email to