Hi, I'm currently trying to build a cache layer that should sit "on top" of the datanode. Essentially, the namenode should know the port number of the cache layer instead of that of the datanode (since the namenode then relays this information to the default HDFS client). All of the communication between the datanode and the namenode currently flows through my cache layer (including heartbeats, etc.)
*First question*: is there a way to tell the namenode where a datanode should be? Any way to trick it into thinking that the datanode is on a port number where it actually isn't? As far as I can tell, the port number is obtained from the DatanodeId object; can this be set in the configuration so that the port number derived is that of the cache layer? I spent quite a bit of time on the above question and I could not find any sort of configuration option that would let me do that. So, I delved into the HDFS source code and tracked down the DatanodeRegistration class. However, I can't seem to find out *how* the NameNode figures out the Datanode's port number or if I could somehow change the packets to reflect the port number of cache layer? *Second question: *how does the namenode figure out a newly-registered Datanode's port number? Thank you, Dhaivat