+1. In addition to thanking the developers, special thanks to Xiaoqiao He for pushing this feature.
Best Regards, - Shilun Fan On Thu, Feb 13, 2025 at 2:29 PM Xiaoqiao He <hexiaoq...@apache.org> wrote: > Great. +1 from my side. Thanks. > > Best Regards, > - He Xiaoqiao > > On Thu, Feb 13, 2025 at 10:15 AM jian zhang <keeprom...@apache.org> wrote: > >> Hi, He Xiaoqiao >> >> I have rebased HDFS-17531 again and resolved the conflicts. The current >> pipeline failure is unrelated to the ARR feature and was introduced by >> slfan1989's PR: HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in >> hadoop-common Part 1. (#7339). slfan1989 will fix it later. >> >> Best Regards, >> - Jian Zhang >> >> Xiaoqiao He <hexiaoq...@apache.org> 于2025年2月11日周二 11:15写道: >> >> > Hi Jian Zhang, Thanks for your great work. Please fix the conflict >> first, >> > others make sense to me. >> > I will give my +1 once it is ready. >> > Another thing, before check in we need to launch another official vote >> > thread. Good luck. >> > >> > BTW, Happy lunar new year! >> > >> > Best Regards, >> > - He Xiaoqiao >> > >> > On Thu, Feb 6, 2025 at 5:30 PM jian zhang <keeprom...@apache.org> >> wrote: >> > >> >> Hi, all, >> >> Currently this feature has been developed and passed the pipeline. >> Please >> >> continue to help review this feature. >> >> >> >> Best Regards, >> >> Jian Zhang >> >> >> >> Zhanghaobo <hfutzhan...@163.com> 于2025年1月22日周三 18:22写道: >> >> >> >>> @Hui Fei Hi, Sir: >> >>> For the first opinion, I have create an umbrella JIRA >> >>> https://issues.apache.org/jira/browse/HDFS-17716 >> >>> and move non-core JIRA under it. >> >>> >> >>> Best Wishes >> >>> Haobo Zhang >> >>> >> >>> ---- Replied Message ---- >> >>> From Hui Fei<feihui.u...@gmail.com> <feihui.u...@gmail.com> >> >>> Date 01/22/2025 17:37 >> >>> To jian zhang<keeprom...@apache.org> <keeprom...@apache.org> >> >>> Cc Hdfs-dev<hdfs-dev@hadoop.apache.org> , >> >>> <hdfs-dev@hadoop.apache.org> <priv...@hadoop.apache.org> , >> >>> <priv...@hadoop.apache.org> Xiaoqiao He<hexiaoq...@apache.org> , >> >>> <hexiaoq...@apache.org> <common-...@hadoop.apache.org> >> >>> <common-...@hadoop.apache.org> >> >>> Subject Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk. >> >>> Got your idea. Thank you! >> >>> - How about removing unfinished tasks and placing them under a new >> task >> >>> as >> >>> subtasks, like ARR improvements? If this feature is completed but >> there >> >>> are >> >>> still some open tasks, it looks strange. >> >>> - Will it take a long time to add documentation? Discussion may last >> for >> >>> several days. If it takes a long time, I think it may block the trunk >> >>> release and All community members need to remember that there is >> >>> documentation required. It doesn't look good. That's my thought, and >> we >> >>> can >> >>> wait for others' opinions >> >>> >> >>> jian zhang <keeprom...@apache.org> 于2025年1月22日周三 16:53写道: >> >>> >> >>> Hi, Hui Fei, >> >>> - The remaining 3 sub tasks are not related to the core functions of >> >>> the asynchronous router, and these sub tasks have little impact on the >> >>> trunk branch, we can wait until HDFS-17531 is merged into the trunk, >> and >> >>> then submit the remaining PRs directly to the trunk. >> >>> - It is indeed necessary to add a documentation to >> >>> "HDFSRouterFederation.md", how about submitting a PR to do this after >> >>> merging HDFS-17531 into the trunk branch? >> >>> >> >>> Best Regards, >> >>> Jian Zhang >> >>> >> >>> Hui Fei <feihui.u...@gmail.com> 于2025年1月22日周三 16:24写道: >> >>> >> >>> Thanks for your great work, looking forward to this feature. >> >>> >> >>> Some comments from me. >> >>> - I checked and found that there are still 3 sub tasks under this >> >>> feature jira ticket, are they necessary to be solved? >> >>> - I didn't find the documentation for this feature. It's a key >> feature, >> >>> Is it necessary to add documentation to HDFSRouterFederation.md? >> >>> >> >>> jian zhang <zjkeeprom...@gmail.com> 于2025年1月22日周三 10:29写道: >> >>> >> >>> Hi, all, the development of the asynchronous router functionality has >> >>> been completed. The development branch is HDFS-17531, and it is ready >> to >> >>> be >> >>> merged into the trunk branch. >> >>> >> >>> JIRA: HDFS-17531 https://issues.apache.org/jira/browse/HDFS-17531 >> >>> PR: https://github.com/apache/hadoop/pull/7308 >> >>> >> >>> Here is the functionality introduction of the asynchronous router for >> >>> everyone to review: >> >>> I. Overview >> >>> >> >>> The asynchronous router aims to address the performance bottleneck >> >>> issues of the synchronous router in high - concurrency and multi - >> >>> nameservices scenarios. By introducing an asynchronous processing >> >>> mechanism, it optimizes the request handling process, improves the >> >>> system's >> >>> concurrency ability and resource utilization, and is particularly >> >>> suitable >> >>> for the federated scenarios where multiple downstream services (NS) >> need >> >>> to >> >>> be processed. >> >>> >> >>> II. Problems of the Synchronous Router >> >>> >> >>> - Performance Bottleneck: The performance of the synchronous router >> >>> is limited by the number of handler threads. Even if the connection >> >>> thread >> >>> can still forward requests to the downstream namenode, the handler >> must >> >>> wait for each request to complete before processing the next one, >> >>> resulting >> >>> in limited processing capacity. >> >>> - Thread Resource Waste: To improve performance, increasing the >> >>> number of handler threads will lead to more thread switches, which >> >>> instead >> >>> reduces the system efficiency. At the same time, a large number of >> >>> handler >> >>> threads are in a blocked state, wasting thread resources. >> >>> - Poor Isolation in Multi - ns: If the performance of a certain >> >>> nameservice in the downstream nameservice is poor, it will cause the >> >>> handler to wait for a long time, thus affecting the forwarding of >> >>> requests >> >>> to other normal - performance ns, resulting in a decrease in the >> overall >> >>> performance of the downstream ns services perceived by the client. >> >>> - Ineffective Utilization of Federation Multi - ns Performance: In >> >>> high - concurrency scenarios, a large number of requests may be >> >>> backlogged >> >>> in the router's request queue, while the queues of downstream services >> >>> are >> >>> not fully utilized, leading to unreasonable resource allocation. >> >>> >> >>> III. Design and Improvements of the Asynchronous Router >> >>> >> >>> The asynchronous router solves the above problems by redesigning the >> >>> request handling process and introducing an asynchronous processing >> >>> mechanism. Its core improvements include: >> >>> >> >>> - Handler: Retrieves requests from the request queue for preliminary >> >>> processing. If there are exceptions in the request (such as the mount >> >>> point >> >>> does not exist, etc.), it directly puts the response into the response >> >>> queue; otherwise, it sends the request to the asynchronous handler >> thread >> >>> pool. >> >>> - Async Handler: Puts the request into the call queue >> >>> (connection.calls) of the connection thread and returns immediately >> >>> without >> >>> blocking and waiting. >> >>> - Async Responder: Is responsible for processing the responses >> >>> received by the connection thread. If the request needs to be re - >> >>> initiated (such as the downstream service returns a standby >> exception), >> >>> it >> >>> re - adds the request to the asynchronous handler thread pool; >> otherwise, >> >>> it puts the response into the response queue. >> >>> - Responder: Retrieves the response from the response queue and >> >>> returns it to the client. >> >>> >> >>> IV. Advantages of the Asynchronous Router >> >>> >> >>> - High - Concurrency Performance: Through the asynchronous >> >>> processing mechanism, the asynchronous router can handle a large >> number >> >>> of >> >>> requests simultaneously, significantly improving the system's >> concurrent >> >>> processing ability. >> >>> - High Resource Utilization: It avoids thread blocking and frequent >> >>> switching, reduces thread resource waste, and improves the overall >> >>> efficiency of the system. >> >>> - Isolation: Different ns are processed by different async handler >> >>> thread pools, achieving isolation of different downstream services. >> Even >> >>> if >> >>> the performance of a certain service is poor, it will not affect the >> >>> processing ability of other services. >> >>> >> >>> V. Summary >> >>> >> >>> The asynchronous router solves the performance bottleneck problem of >> >>> the traditional synchronous router in high - concurrency scenarios by >> >>> introducing an asynchronous processing mechanism. It not only improves >> >>> the >> >>> system's concurrency ability and resource utilization but also >> achieves >> >>> isolation of downstream services through the queue mechanism, >> enhancing >> >>> the >> >>> system's stability and adaptability. In the federated scenarios where >> >>> multiple downstream services need to be processed, the asynchronous >> >>> router >> >>> is a more efficient and reliable solution. >> >>> VI. Performance Testing >> >>> >> >>> >> >>> >> >>> >> https://docs.google.com/document/d/1meHOCvhm3XRHlIMwvKFidfUSjveTJrb8yAMasrM_HrY/edit?tab=t.0#heading=h.du0zlo2k5sb1 >> >>> >> >>> VII. JIRA & RPs >> >>> >> >>> For more information, please refer to JIRA: >> >>> JIRA: RBF: Asynchronous router RPC: >> >>> https://issues.apache.org/jira/browse/HDFS-17531 >> >>> PRs: >> >>> HDFS-17543. [ARR] AsyncUtil makes asynchronous code more concise and >> >>> easier. >> >>> HADOOP-19235. IPC client uses CompletableFuture to support >> >>> asynchronous operations. >> >>> HDFS-17544. [ARR] The router client rpc protocol PB supports >> >>> asynchrony. >> >>> HDFS-17545. [ARR] router async rpc client. >> >>> HDFS-17594. [ARR] RouterCacheAdmin supports asynchronous rpc. >> >>> HDFS-17597. [ARR] RouterSnapshot supports asynchronous rpc. >> >>> HDFS-17595. [ARR] ErasureCoding supports asynchronous rpc. >> >>> HDFS-17601. [ARR] RouterRpcServer supports asynchronous rpc. >> >>> HDFS-17596. [ARR] RouterStoragePolicy supports asynchronous rpc. >> >>> HDFS-17656. [ARR] RouterNamenodeProtocol and RouterUserProtocol >> >>> supports asynchronous rpc. >> >>> HDFS-17659. [ARR]Router Quota supports asynchronous rpc. >> >>> HDFS-17672. [ARR] Move asynchronous related classes to the async >> >>> package. >> >>> HADOOP-19361. RPC DeferredMetrics bugfix. >> >>> HDFS-17640.[ARR] RouterClientProtocol supports asynchronous rpc. >> >>> HDFS-17650. [ARR] The router server-side rpc protocol PB supports >> >>> asynchrony. >> >>> HDFS-17651.[ARR] Async handler executor isolation. >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >> >>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >> >>> >> >>> >> >>> >> >