+1. In addition to thanking the developers, special thanks to XiaoQiao He for pushing this feature.
Best Regards, - Shilun Fan On Fri, Feb 14, 2025 at 3:40 PM Hui Fei <feihui.u...@gmail.com> wrote: > > Did some test work referring to the documentation. > > - Compiled source codes, built local cluster and the async feature > worked fine. > - it is disabled by default > - can increase or decrease the thread number by changing the related > configurations > > The feature is as described in the documentation. Great work, +1 > > Xiaoqiao He <hexiaoq...@apache.org> 于2025年2月13日周四 14:29写道: > >> Great. +1 from my side. Thanks. >> >> Best Regards, >> - He Xiaoqiao >> >> On Thu, Feb 13, 2025 at 10:15 AM jian zhang <keeprom...@apache.org> >> wrote: >> >>> Hi, He Xiaoqiao >>> >>> I have rebased HDFS-17531 again and resolved the conflicts. The current >>> pipeline failure is unrelated to the ARR feature and was introduced by >>> slfan1989's PR: HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in >>> hadoop-common Part 1. (#7339). slfan1989 will fix it later. >>> >>> Best Regards, >>> - Jian Zhang >>> >>> Xiaoqiao He <hexiaoq...@apache.org> 于2025年2月11日周二 11:15写道: >>> >>> > Hi Jian Zhang, Thanks for your great work. Please fix the conflict >>> first, >>> > others make sense to me. >>> > I will give my +1 once it is ready. >>> > Another thing, before check in we need to launch another official vote >>> > thread. Good luck. >>> > >>> > BTW, Happy lunar new year! >>> > >>> > Best Regards, >>> > - He Xiaoqiao >>> > >>> > On Thu, Feb 6, 2025 at 5:30 PM jian zhang <keeprom...@apache.org> >>> wrote: >>> > >>> >> Hi, all, >>> >> Currently this feature has been developed and passed the pipeline. >>> Please >>> >> continue to help review this feature. >>> >> >>> >> Best Regards, >>> >> Jian Zhang >>> >> >>> >> Zhanghaobo <hfutzhan...@163.com> 于2025年1月22日周三 18:22写道: >>> >> >>> >>> @Hui Fei Hi, Sir: >>> >>> For the first opinion, I have create an umbrella JIRA >>> >>> https://issues.apache.org/jira/browse/HDFS-17716 >>> >>> and move non-core JIRA under it. >>> >>> >>> >>> Best Wishes >>> >>> Haobo Zhang >>> >>> >>> >>> ---- Replied Message ---- >>> >>> From Hui Fei<feihui.u...@gmail.com> <feihui.u...@gmail.com> >>> >>> Date 01/22/2025 17:37 >>> >>> To jian zhang<keeprom...@apache.org> <keeprom...@apache.org> >>> >>> Cc Hdfs-dev<hdfs-dev@hadoop.apache.org> , >>> >>> <hdfs-dev@hadoop.apache.org> <priv...@hadoop.apache.org> , >>> >>> <priv...@hadoop.apache.org> Xiaoqiao He<hexiaoq...@apache.org> , >>> >>> <hexiaoq...@apache.org> <common-...@hadoop.apache.org> >>> >>> <common-...@hadoop.apache.org> >>> >>> Subject Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk. >>> >>> Got your idea. Thank you! >>> >>> - How about removing unfinished tasks and placing them under a new >>> task >>> >>> as >>> >>> subtasks, like ARR improvements? If this feature is completed but >>> there >>> >>> are >>> >>> still some open tasks, it looks strange. >>> >>> - Will it take a long time to add documentation? Discussion may last >>> for >>> >>> several days. If it takes a long time, I think it may block the trunk >>> >>> release and All community members need to remember that there is >>> >>> documentation required. It doesn't look good. That's my thought, and >>> we >>> >>> can >>> >>> wait for others' opinions >>> >>> >>> >>> jian zhang <keeprom...@apache.org> 于2025年1月22日周三 16:53写道: >>> >>> >>> >>> Hi, Hui Fei, >>> >>> - The remaining 3 sub tasks are not related to the core functions of >>> >>> the asynchronous router, and these sub tasks have little impact on >>> the >>> >>> trunk branch, we can wait until HDFS-17531 is merged into the trunk, >>> and >>> >>> then submit the remaining PRs directly to the trunk. >>> >>> - It is indeed necessary to add a documentation to >>> >>> "HDFSRouterFederation.md", how about submitting a PR to do this after >>> >>> merging HDFS-17531 into the trunk branch? >>> >>> >>> >>> Best Regards, >>> >>> Jian Zhang >>> >>> >>> >>> Hui Fei <feihui.u...@gmail.com> 于2025年1月22日周三 16:24写道: >>> >>> >>> >>> Thanks for your great work, looking forward to this feature. >>> >>> >>> >>> Some comments from me. >>> >>> - I checked and found that there are still 3 sub tasks under this >>> >>> feature jira ticket, are they necessary to be solved? >>> >>> - I didn't find the documentation for this feature. It's a key >>> feature, >>> >>> Is it necessary to add documentation to HDFSRouterFederation.md? >>> >>> >>> >>> jian zhang <zjkeeprom...@gmail.com> 于2025年1月22日周三 10:29写道: >>> >>> >>> >>> Hi, all, the development of the asynchronous router functionality has >>> >>> been completed. The development branch is HDFS-17531, and it is >>> ready to >>> >>> be >>> >>> merged into the trunk branch. >>> >>> >>> >>> JIRA: HDFS-17531 https://issues.apache.org/jira/browse/HDFS-17531 >>> >>> PR: https://github.com/apache/hadoop/pull/7308 >>> >>> >>> >>> Here is the functionality introduction of the asynchronous router for >>> >>> everyone to review: >>> >>> I. Overview >>> >>> >>> >>> The asynchronous router aims to address the performance bottleneck >>> >>> issues of the synchronous router in high - concurrency and multi - >>> >>> nameservices scenarios. By introducing an asynchronous processing >>> >>> mechanism, it optimizes the request handling process, improves the >>> >>> system's >>> >>> concurrency ability and resource utilization, and is particularly >>> >>> suitable >>> >>> for the federated scenarios where multiple downstream services (NS) >>> need >>> >>> to >>> >>> be processed. >>> >>> >>> >>> II. Problems of the Synchronous Router >>> >>> >>> >>> - Performance Bottleneck: The performance of the synchronous router >>> >>> is limited by the number of handler threads. Even if the connection >>> >>> thread >>> >>> can still forward requests to the downstream namenode, the handler >>> must >>> >>> wait for each request to complete before processing the next one, >>> >>> resulting >>> >>> in limited processing capacity. >>> >>> - Thread Resource Waste: To improve performance, increasing the >>> >>> number of handler threads will lead to more thread switches, which >>> >>> instead >>> >>> reduces the system efficiency. At the same time, a large number of >>> >>> handler >>> >>> threads are in a blocked state, wasting thread resources. >>> >>> - Poor Isolation in Multi - ns: If the performance of a certain >>> >>> nameservice in the downstream nameservice is poor, it will cause the >>> >>> handler to wait for a long time, thus affecting the forwarding of >>> >>> requests >>> >>> to other normal - performance ns, resulting in a decrease in the >>> overall >>> >>> performance of the downstream ns services perceived by the client. >>> >>> - Ineffective Utilization of Federation Multi - ns Performance: In >>> >>> high - concurrency scenarios, a large number of requests may be >>> >>> backlogged >>> >>> in the router's request queue, while the queues of downstream >>> services >>> >>> are >>> >>> not fully utilized, leading to unreasonable resource allocation. >>> >>> >>> >>> III. Design and Improvements of the Asynchronous Router >>> >>> >>> >>> The asynchronous router solves the above problems by redesigning the >>> >>> request handling process and introducing an asynchronous processing >>> >>> mechanism. Its core improvements include: >>> >>> >>> >>> - Handler: Retrieves requests from the request queue for preliminary >>> >>> processing. If there are exceptions in the request (such as the mount >>> >>> point >>> >>> does not exist, etc.), it directly puts the response into the >>> response >>> >>> queue; otherwise, it sends the request to the asynchronous handler >>> thread >>> >>> pool. >>> >>> - Async Handler: Puts the request into the call queue >>> >>> (connection.calls) of the connection thread and returns immediately >>> >>> without >>> >>> blocking and waiting. >>> >>> - Async Responder: Is responsible for processing the responses >>> >>> received by the connection thread. If the request needs to be re - >>> >>> initiated (such as the downstream service returns a standby >>> exception), >>> >>> it >>> >>> re - adds the request to the asynchronous handler thread pool; >>> otherwise, >>> >>> it puts the response into the response queue. >>> >>> - Responder: Retrieves the response from the response queue and >>> >>> returns it to the client. >>> >>> >>> >>> IV. Advantages of the Asynchronous Router >>> >>> >>> >>> - High - Concurrency Performance: Through the asynchronous >>> >>> processing mechanism, the asynchronous router can handle a large >>> number >>> >>> of >>> >>> requests simultaneously, significantly improving the system's >>> concurrent >>> >>> processing ability. >>> >>> - High Resource Utilization: It avoids thread blocking and frequent >>> >>> switching, reduces thread resource waste, and improves the overall >>> >>> efficiency of the system. >>> >>> - Isolation: Different ns are processed by different async handler >>> >>> thread pools, achieving isolation of different downstream services. >>> Even >>> >>> if >>> >>> the performance of a certain service is poor, it will not affect the >>> >>> processing ability of other services. >>> >>> >>> >>> V. Summary >>> >>> >>> >>> The asynchronous router solves the performance bottleneck problem of >>> >>> the traditional synchronous router in high - concurrency scenarios by >>> >>> introducing an asynchronous processing mechanism. It not only >>> improves >>> >>> the >>> >>> system's concurrency ability and resource utilization but also >>> achieves >>> >>> isolation of downstream services through the queue mechanism, >>> enhancing >>> >>> the >>> >>> system's stability and adaptability. In the federated scenarios where >>> >>> multiple downstream services need to be processed, the asynchronous >>> >>> router >>> >>> is a more efficient and reliable solution. >>> >>> VI. Performance Testing >>> >>> >>> >>> >>> >>> >>> >>> >>> https://docs.google.com/document/d/1meHOCvhm3XRHlIMwvKFidfUSjveTJrb8yAMasrM_HrY/edit?tab=t.0#heading=h.du0zlo2k5sb1 >>> >>> >>> >>> VII. JIRA & RPs >>> >>> >>> >>> For more information, please refer to JIRA: >>> >>> JIRA: RBF: Asynchronous router RPC: >>> >>> https://issues.apache.org/jira/browse/HDFS-17531 >>> >>> PRs: >>> >>> HDFS-17543. [ARR] AsyncUtil makes asynchronous code more concise and >>> >>> easier. >>> >>> HADOOP-19235. IPC client uses CompletableFuture to support >>> >>> asynchronous operations. >>> >>> HDFS-17544. [ARR] The router client rpc protocol PB supports >>> >>> asynchrony. >>> >>> HDFS-17545. [ARR] router async rpc client. >>> >>> HDFS-17594. [ARR] RouterCacheAdmin supports asynchronous rpc. >>> >>> HDFS-17597. [ARR] RouterSnapshot supports asynchronous rpc. >>> >>> HDFS-17595. [ARR] ErasureCoding supports asynchronous rpc. >>> >>> HDFS-17601. [ARR] RouterRpcServer supports asynchronous rpc. >>> >>> HDFS-17596. [ARR] RouterStoragePolicy supports asynchronous rpc. >>> >>> HDFS-17656. [ARR] RouterNamenodeProtocol and RouterUserProtocol >>> >>> supports asynchronous rpc. >>> >>> HDFS-17659. [ARR]Router Quota supports asynchronous rpc. >>> >>> HDFS-17672. [ARR] Move asynchronous related classes to the async >>> >>> package. >>> >>> HADOOP-19361. RPC DeferredMetrics bugfix. >>> >>> HDFS-17640.[ARR] RouterClientProtocol supports asynchronous rpc. >>> >>> HDFS-17650. [ARR] The router server-side rpc protocol PB supports >>> >>> asynchrony. >>> >>> HDFS-17651.[ARR] Async handler executor isolation. >>> >>> --------------------------------------------------------------------- >>> >>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >>> >>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >>> >>> >>> >>> >>> >>> >>> >>