Questions about HDFS’s placement policy

Giovanni Marzulli Wed, 14 Mar 2012 09:25:27 -0700

Hello,

I'm trying HDFS on a small test cluster and I need to clarify somedoubts about hadoop behaviour.


Some details of my cluster:
Hadoop version: 0.20.2
I have two racks (rack1, rack2). Three datanodes for every rack.
Replication factor is set to 3.

"HDFS’s placement policy is to put one replica on one node in the localrack, another on a node in a different (remote) rack, and the last on adifferent node in the same remote rack."Instead, I noticed that sometimes, a few blocks of files are stored asfollows: two replicas in the local rack and a replica in a differentrack. Are there exceptions that cause different behaviour than defaultplacement policy?Likewise, at times some blocks are read from nodes in the remote rackinstead of nodes in the local rack. Why does it happen?

Another thing:if I have two datacenters and two racks for each of them(so a hierarchical network topology), where tworemote replicasarestored? Does Hadoop consider the hierarchy and stores one replica inthe local datacenter and two replicas in the other datacenter? Or thetwo replicas are stored in a totally random rack?


Thanks
Gianni

Questions about HDFS’s placement policy

Reply via email to