Thanks for the reply. I am currently trying to move all the blocks and its replicas, of all the input files only, to a specified location. That is, just before job start-up check for the input files' location and move its corresponding blocks and replicas to the desired/highly efficient data nodes, there by making sure only these nodes execute the job (I am assuming this because, I believe each block will be operated upon by the nearest available mapping process only).
And in your reply you had mentioned that some of the work should be initiated from the client, is it the JobClient class you are talking about? Thanks, Arun --- On Fri, 2/19/10, Wang Xu <gna...@gmail.com> wrote: From: Wang Xu <gna...@gmail.com> Subject: Re: Question on job scheduling To: common-dev@hadoop.apache.org Date: Friday, February 19, 2010, 7:25 AM On Thu, Feb 18, 2010 at 12:00 AM, arun kumar <arunkumar_sk...@yahoo.com> wrote: > My questions are: > 1. Will such a change improve the performance? Considering the overhead > caused by moving the data blocks. In some special case, it might improve the performance, but it depends on your application. > 2. I believe I will have to start from the NameNode to move the blocks. If > anyone can give me a brief explanation on the process to implement this or > even sources to find information on this it would be very helpful. I think some of the work might initiate from client. Could you describe what you want to do in detail? 1 do you want to specify datanode to store special blocks, or only want some blocks are located together? 2 do you want to specify the location of all the replicas of a block, or only want to specify one of the replicas. -- Wang Xu Stephen LeacockĀ - "I detest life-insurance agents: they always argue that I shall some day die, which is not so." - http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html