Hi, I'm trying to run a program on Hadoop.
[Input] tsv file My program does the following. (1) Load tsv into hive load data local inpath 'tsvfile' overwrite into table A partitioned by xx (2) insert overwrite table B select a, b, c from table A where datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))), request_date) <= 30 (3) Running Mahout In step 2, i am trying to retrieve data from hive for the past month. My hadoop work always stopped here. When i check through my browser utility it says that Diagnostic Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201211291541_0262_m_001800 Task attempt_201211291541_0262_m_001800_0 failed to report status for 1802 seconds. Killing! Error: Java heap space Task attempt_201211291541_0262_m_001800_2 failed to report status for 1800 seconds. Killing! Task attempt_201211291541_0262_m_001800_3 failed to report status for 1801 seconds. Killing! Each hive table is big, around 6 GB. (1) Is it too big to have around 6GB for each hive table? (2) I've increased by HEAPSIZE to 50G,which i think is far more than enough. Any else where i can do the tuning? Thank you. rei