I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS cluster
and replication factor is 1. A large file is there on one of those three
cluster machines in its local file system. If I put that file in HDFS will it
be divided and distributed across all three machines? I had this doubt as HDFS
"moving computation is cheaper than moving data".
If file is distributed across all three machines, lots of data transfer will be
there, whereas, if file is NOT distributed then compute power of other machine
will be unused. Am I missing something here?
-Raj