All,
I am a newbie and trying to learn the Hadoop internals. I have got a few
questions on something that I am trying to implement. I am still in the
learning process, so these questions may seem silly or may even be entirely
wrong, any guidance is highly appreciated.
I am working on release 0.18.3 and trying to do some job scheduling, that is:
instead of placing the data blocks on specific/desired nodes when input files
are copied into HDFS, I am trying to move the blocks from their original
locations to these desired/highly efficient nodes just before job submission.
My questions are:
1. Will such a change improve the performance? Considering the overhead caused
by moving the data blocks.
2. I believe I will have to start from the NameNode to move the blocks. If
anyone can give me a brief explanation on the process to implement this or even
sources to find information on this it would be very helpful.
Thanks,
Arun