Hi forks,

The current implementation of RBF is not sensitive about data locality,
since NameNode could not get real client hostname by invoke
Server#getRemoteAddress when RPC request forward by Router to NameNode.
Therefore, it will lead to several challenges, for instance,

   - a. Client could have to go for remote read instead of local read,
   Short-Circuit could not be used in most cases.
   - b. Block placement policy could not run as except based on defined
   rack aware. Thus it will loss local node write.

There are some different solutions to solve data locality issue after
discussion, some of them will change RPC protocol, so we look forward to
furthermore suggestions and votes. HDFS-13248 is tracking the issue.

   - Approach A: Changing IPC/RPC layer protocol (IpcConnectionContextProto
   or RpcHeader#RpcRequestHeaderProto) and add extra field about client
   hostname. Of course the new field is optional, only input by Router and
   parse by Namenode in generally. This approach is compatibility and Client
   should do nothing after changing.
   - Approach B: Changing ClientProtocol and add extra interface
   create/append/getBlockLocations with additional parameter about client
   hostname. As approach A, it is input by Router and parse by Namenode, and
   also is compatibility.
   - Approach C: Solve write and read locality separately based on current
   interface and no changes, for write, hack client hostname as one of favor
   nodes for addBlocks, for read, reorder targets at Router after Namenode
   returns result to Router.

As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
layer protocol to support RPC data locality. We welcome more suggestions,
votes or just give us feedback to push forward this feature. Thanks.

Best Regards,
Hexiaoqiao

reference
[1] https://issues.apache.org/jira/browse/HDFS-13248
[2] https://issues.apache.org/jira/browse/HDFS-10467

[3] https://issues.apache.org/jira/browse/HDFS-12615

Reply via email to