Thanks Rekha, it is really helpful!

Could you, or anybody, please also help me understand following questions?

1. How could I get the progress score of each task (map or reduce). Can I have 
them from the log files, directly or by configuring them to "debug" mode or I 
need to change the source of Hadoop? 

2. For speculative execution, hadoop looks at the average progress score of map 
tasks( or of reduce tasks ) and compare a task's progress score with the 
average. If it is less than the average - 0.2, the task is a straggler. For 
example, if there are 10 map tasks, we first compute the average progress score 
of the 10 map tasks, then we compare each of the 10 map tasks to the average to 
find the straggler. Am I right on the algorithm? Please do correct me if I am 
wrong.

Thanks a lot!
Regards
Chengwei     

----- Original Message -----
From: "Rekha Joshi" <rekha...@yahoo-inc.com>
To: common-dev@hadoop.apache.org
Sent: Friday, November 12, 2010 12:53:25 AM
Subject: Re: about the task statistics in the history directory

Hi Chengwei,

If it helps, reading the hadoop tutorial, the configuration files along with 
API JobHistory* pages would provide you the main details.
For eg: 
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory.MapAttempt.html
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory.Keys.html

There is a typo on api - 
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory
"JobHistory.ReduceAttempt 
<http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory.ReduceAttempt.html>
          Helper class for logging or reading back events related to start, 
finish or failure of  a Map Attempt on a node."

It should be "Reduce" instead of "Map".Use your judgment. :)

Just an example that only code is gospel truth, api/document are guiding force.

Thanks & Regards,
/Rekha.

On 11/12/10 7:57 AM, "Wang, Chengwei" <wan...@gatech.edu> wrote:

HI All,

I just wonder if there is any doc explaining the terms in the task statistics 
in the logs/history/ ? For example 'SPLITS', 'MapAttempt'?

Thanks a lot for enlightening.

Regards
Chengwei


Reply via email to