All, I am a newbie and trying to learn the Hadoop internals. I have got a few questions on something that I am trying to implement. I am still in the learning process, so these questions may seem silly or may even be entirely wrong, any guidance is highly appreciated.
I am working on release 0.18.3 and trying to do some job scheduling, that is: instead of placing the data blocks on specific/desired nodes when input files are copied into HDFS, I am trying to move the blocks from their original locations to these desired/highly efficient nodes just before job submission. My questions are: 1. Will such a change improve the performance? Considering the overhead caused by moving the data blocks. 2. I believe I will have to start from the NameNode to move the blocks. If anyone can give me a brief explanation on the process to implement this or even sources to find information on this it would be very helpful. Thanks, Arun