Re: Question on job scheduling

arun kumar Wed, 24 Feb 2010 06:28:27 -0800

Thanks for the reply. I am currently trying to move all the blocks and its 
replicas, of all the input files only, to a specified location. That is, just 
before job start-up check for the input files' location and move its 
corresponding blocks and replicas to the desired/highly efficient data nodes, 
there by making sure only these nodes execute the job (I am assuming this 
because, I believe each block will be operated upon by the nearest available 
mapping process only).

And in your reply you had mentioned that some of the work should be initiated 
from the client, is it the JobClient class you are talking about?

Thanks,
Arun

--- On Fri, 2/19/10, Wang Xu <gna...@gmail.com> wrote:

From: Wang Xu <gna...@gmail.com>
Subject: Re: Question on job scheduling
To: common-dev@hadoop.apache.org
Date: Friday, February 19, 2010, 7:25 AM

On Thu, Feb 18, 2010 at 12:00 AM, arun kumar <arunkumar_sk...@yahoo.com> wrote:
> My questions are:
> 1. Will such a change improve the performance? Considering the overhead 
> caused by moving the data blocks.

In some special case, it might improve the performance, but it depends
on your application.

> 2. I believe I will have to start from the NameNode to move the blocks. If 
> anyone can give me a brief explanation on the process to implement this or 
> even sources to find information on this it would be very helpful.

I think some of the work might initiate from client. Could you
describe what you want to do in detail?
 1 do you want to specify datanode to store special blocks, or only
want some blocks are located together?
 2 do you want to specify the location of all the replicas of a block,
or only want to specify one of the replicas.

-- 
Wang Xu
Stephen Leacock  - "I detest life-insurance agents: they always argue
that I shall some day die, which is not so." -
http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html

Re: Question on job scheduling

Reply via email to