Hi, On Mon, Dec 23, 2013 at 9:41 AM, Dhaivat Pandya <dhaivatpan...@gmail.com> wrote: > Hi, > > I'm currently trying to build a cache layer that should sit "on top" of the > datanode. Essentially, the namenode should know the port number of the > cache layer instead of that of the datanode (since the namenode then relays > this information to the default HDFS client). All of the communication > between the datanode and the namenode currently flows through my cache > layer (including heartbeats, etc.)
Curious Q: What does your cache layer aim to do btw? If its a data cache, have you checked out the design being implemented currently by https://issues.apache.org/jira/browse/HDFS-4949? > *First question*: is there a way to tell the namenode where a datanode > should be? Any way to trick it into thinking that the datanode is on a port > number where it actually isn't? As far as I can tell, the port number is > obtained from the DatanodeId object; can this be set in the configuration > so that the port number derived is that of the cache layer? The NN receives a DN host and port from the DN directly. The DN sends it whatever its running on. See https://github.com/apache/hadoop-common/blob/release-2.2.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L690 > I spent quite a bit of time on the above question and I could not find any > sort of configuration option that would let me do that. So, I delved into > the HDFS source code and tracked down the DatanodeRegistration class. > However, I can't seem to find out *how* the NameNode figures out the > Datanode's port number or if I could somehow change the packets to reflect > the port number of cache layer? See https://github.com/apache/hadoop-common/blob/release-2.2.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L690 (as above) for how the DN emits it. And no, IMO, that ("packet changes") is not the right way to go about it if you're planning an overhaul. Its easier and more supportable to make proper code changes instead. > *Second question: *how does the namenode > figure out a newly-registered Datanode's port number? Same as before. Registration sends the service addresses (so NN may use them for sending to clients), beyond which the DN's heartbeats are mere client-like connections to the NN, carried out on regular ephemeral ports. -- Harsh J