Hi Nikhil,

For #2, there is some initial work done in a project called Chukwa.  Chukwa is 
designed to collect hadoop metrics,  job log files, hdfs/mr client trace log 
files, and system metrics.  By using those data, it is possible to reconstruct 
state machines of the health of the hadoop cluster and identify faulty 
hardware.  If you are interested, the research work is in a jira at:

https://issues.apache.org/jira/browse/CHUKWA-94

Jiaqi Tan has written a more detailed research paper at:

https://issues.apache.org/jira/secure/attachment/12404723/tan.pdf

There was another research project in Yahoo, which base on using hadoop metrics 
with one class svm classification to identify faulty hardware by AI.  The AI 
approach has a lot of potential, but it has not been published yet.  However, 
it is on the roadmap for Chukwa project.

Hope this is useful.

Regards,
Eric

On 2/22/11 9:28 AM, "Nikhil Panpalia" <nik...@cs.utexas.edu> wrote:

Hello everyone,

I'm a graduate student at the University of Texas at Austin. I'm looking for
a research/implementation based project on Hadoop and I came across the list
posted on the wiki page - http://wiki.apache.org/hadoop/ProjectSuggestions.
But, this page was last updated in September, 2009. So, I'm not sure if some
of these ideas have already been implemented or not. I was particularly
interested in the following projects (listed on the wiki page):
1) Sort and Shuffle optimization in the MR framework.
2) Hadoop compatible framework for discovering network topology and
identifying and diagnosing hardware that is not functioning correctly.

Can anyone give me any details about these? Are these projects already under
progress or completed?

Thanks,
Nikhil

Reply via email to