Thanks for your great work, looking forward to this feature. Some comments from me. - I checked and found that there are still 3 sub tasks under this feature jira ticket, are they necessary to be solved? - I didn't find the documentation for this feature. It's a key feature, Is it necessary to add documentation to HDFSRouterFederation.md?
jian zhang <zjkeeprom...@gmail.com> 于2025年1月22日周三 10:29写道: > Hi, all, the development of the asynchronous router functionality has been > completed. The development branch is HDFS-17531, and it is ready to be > merged into the trunk branch. > > JIRA: HDFS-17531 https://issues.apache.org/jira/browse/HDFS-17531 > PR: https://github.com/apache/hadoop/pull/7308 > > Here is the functionality introduction of the asynchronous router for > everyone to review: > I. Overview > > The asynchronous router aims to address the performance bottleneck > issues of the synchronous router in high - concurrency and multi - > nameservices scenarios. By introducing an asynchronous processing > mechanism, it optimizes the request handling process, improves the system's > concurrency ability and resource utilization, and is particularly suitable > for the federated scenarios where multiple downstream services (NS) need to > be processed. > > II. Problems of the Synchronous Router > > - Performance Bottleneck: The performance of the synchronous router is > limited by the number of handler threads. Even if the connection thread can > still forward requests to the downstream namenode, the handler must wait > for each request to complete before processing the next one, resulting in > limited processing capacity. > - Thread Resource Waste: To improve performance, increasing the number > of handler threads will lead to more thread switches, which instead reduces > the system efficiency. At the same time, a large number of handler threads > are in a blocked state, wasting thread resources. > - Poor Isolation in Multi - ns: If the performance of a certain > nameservice in the downstream nameservice is poor, it will cause the > handler to wait for a long time, thus affecting the forwarding of requests > to other normal - performance ns, resulting in a decrease in the overall > performance of the downstream ns services perceived by the client. > - Ineffective Utilization of Federation Multi - ns Performance: In > high - concurrency scenarios, a large number of requests may be backlogged > in the router's request queue, while the queues of downstream services are > not fully utilized, leading to unreasonable resource allocation. > > III. Design and Improvements of the Asynchronous Router > > The asynchronous router solves the above problems by redesigning the > request handling process and introducing an asynchronous processing > mechanism. Its core improvements include: > > - Handler: Retrieves requests from the request queue for preliminary > processing. If there are exceptions in the request (such as the mount point > does not exist, etc.), it directly puts the response into the response > queue; otherwise, it sends the request to the asynchronous handler thread > pool. > - Async Handler: Puts the request into the call queue > (connection.calls) of the connection thread and returns immediately without > blocking and waiting. > - Async Responder: Is responsible for processing the responses > received by the connection thread. If the request needs to be re - > initiated (such as the downstream service returns a standby exception), it > re - adds the request to the asynchronous handler thread pool; otherwise, > it puts the response into the response queue. > - Responder: Retrieves the response from the response queue and > returns it to the client. > > IV. Advantages of the Asynchronous Router > > - High - Concurrency Performance: Through the asynchronous processing > mechanism, the asynchronous router can handle a large number of requests > simultaneously, significantly improving the system's concurrent processing > ability. > - High Resource Utilization: It avoids thread blocking and frequent > switching, reduces thread resource waste, and improves the overall > efficiency of the system. > - Isolation: Different ns are processed by different async handler > thread pools, achieving isolation of different downstream services. Even if > the performance of a certain service is poor, it will not affect the > processing ability of other services. > > V. Summary > > The asynchronous router solves the performance bottleneck problem of > the traditional synchronous router in high - concurrency scenarios by > introducing an asynchronous processing mechanism. It not only improves the > system's concurrency ability and resource utilization but also achieves > isolation of downstream services through the queue mechanism, enhancing the > system's stability and adaptability. In the federated scenarios where > multiple downstream services need to be processed, the asynchronous router > is a more efficient and reliable solution. > VI. Performance Testing > > > https://docs.google.com/document/d/1meHOCvhm3XRHlIMwvKFidfUSjveTJrb8yAMasrM_HrY/edit?tab=t.0#heading=h.du0zlo2k5sb1 > > VII. JIRA & RPs > > For more information, please refer to JIRA: > JIRA: RBF: Asynchronous router RPC: > https://issues.apache.org/jira/browse/HDFS-17531 > PRs: > HDFS-17543. [ARR] AsyncUtil makes asynchronous code more concise and > easier. > HADOOP-19235. IPC client uses CompletableFuture to support > asynchronous operations. > HDFS-17544. [ARR] The router client rpc protocol PB supports > asynchrony. > HDFS-17545. [ARR] router async rpc client. > HDFS-17594. [ARR] RouterCacheAdmin supports asynchronous rpc. > HDFS-17597. [ARR] RouterSnapshot supports asynchronous rpc. > HDFS-17595. [ARR] ErasureCoding supports asynchronous rpc. > HDFS-17601. [ARR] RouterRpcServer supports asynchronous rpc. > HDFS-17596. [ARR] RouterStoragePolicy supports asynchronous rpc. > HDFS-17656. [ARR] RouterNamenodeProtocol and RouterUserProtocol > supports asynchronous rpc. > HDFS-17659. [ARR]Router Quota supports asynchronous rpc. > HDFS-17672. [ARR] Move asynchronous related classes to the async > package. > HADOOP-19361. RPC DeferredMetrics bugfix. > HDFS-17640.[ARR] RouterClientProtocol supports asynchronous rpc. > HDFS-17650. [ARR] The router server-side rpc protocol PB supports > asynchrony. > HDFS-17651.[ARR] Async handler executor isolation. > --------------------------------------------------------------------- > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >