[ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948801#comment-13948801
 ] 

Mohammad Kamrul Islam commented on HIVE-6638:
---------------------------------------------

In case, anyone is interested. The testing is an involved process and 
choreographed. I tested it as follows:

set mapred.map.tasks.speculative.execution=false;
set mapred.job.map.memory.mb=4096;
set hive.merge.mapfiles=false;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
create table load_overwrite (key string, value string) stored as textfile;
load data  local inpath '/tmp/data/' into table load_overwrite;
select key from load_overwrite where length(key) > 0 ;

Assuming /tmp/data has four copies of kv1.txt.

Tested against Hadoop 2.3 in single node Mac machine. The four tasks will run 
kind of sequentially.
Important:  When to kill MRAM? I killed the MRAM when the second one finished. 
It could be anytime before the last one finished. Command used: "jps |grep 
MRAppMaster |cut -d' ' -f1|xargs kill"


I was monitoring in two ways:
1. cd HADOOP_LOG_DIR/userlogs/<app-id> and ran "grep  -R "New Final Path" *". 
This will show what tasks are completed with file written to  HDFS.
2. run hadoop fs -lsr hdfs://localhost:9000/tmp/hive-<ID>/. It will show all 
the tasks' output during the execution. At the end , it is cleaned up.


Anyway, if you can kill MRAM during the execution, you should see there are 
only 4 output files . More importantly, you will see the completed (before MRAM 
was killed) task never rerun. Also you get the correct result.








> Hive needs to implement recovery for Application Master restart 
> ----------------------------------------------------------------
>
>                 Key: HIVE-6638
>                 URL: https://issues.apache.org/jira/browse/HIVE-6638
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Mohammad Kamrul Islam
>         Attachments: HIVE-6638.1.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to