Hello,
I'm trying HDFS on a small test cluster and I need to clarify some
doubts about hadoop behaviour.
Some details of my cluster:
Hadoop version: 0.20.2
I have two racks (rack1, rack2). Three datanodes for every rack.
Replication factor is set to 3.
"HDFS’s placement policy is to put one replica on one node in the local
rack, another on a node in a different (remote) rack, and the last on a
different node in the same remote rack."
Instead, I noticed that sometimes, a few blocks of files are stored as
follows: two replicas in the local rack and a replica in a different
rack. Are there exceptions that cause different behaviour than default
placement policy?
Likewise, at times some blocks are read from nodes in the remote rack
instead of nodes in the local rack. Why does it happen?
Another thing:if I have two datacenters and two racks for each of them
(so a hierarchical network topology), where tworemote replicas
arestored? Does Hadoop consider the hierarchy and stores one replica in
the local datacenter and two replicas in the other datacenter? Or the
two replicas are stored in a totally random rack?
Thanks
Gianni