Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk.

slfan1989 Fri, 14 Feb 2025 18:38:26 -0800

+1.

In addition to thanking the developers, special thanks to XiaoQiao He for
pushing this feature.


Best Regards,
- Shilun Fan

On Fri, Feb 14, 2025 at 3:40 PM Hui Fei <feihui.u...@gmail.com> wrote:

>
> Did some test work referring to the documentation.
>
>    - Compiled source codes, built local cluster and the async feature
>    worked fine.
>    - it is disabled by default
>    - can increase or decrease the thread number by changing the related
>    configurations
>
> The feature is as described in the documentation. Great work, +1
>
> Xiaoqiao He <hexiaoq...@apache.org> 于2025年2月13日周四 14:29写道：
>
>> Great. +1 from my side. Thanks.
>>
>> Best Regards,
>> - He Xiaoqiao
>>
>> On Thu, Feb 13, 2025 at 10:15 AM jian zhang <keeprom...@apache.org>
>> wrote:
>>
>>> Hi, He Xiaoqiao
>>>
>>> I have rebased HDFS-17531 again and resolved the conflicts. The current
>>> pipeline failure is unrelated to the ARR feature and was introduced by
>>> slfan1989's PR: HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in
>>> hadoop-common Part 1. (#7339). slfan1989 will fix it later.
>>>
>>> Best Regards,
>>> - Jian Zhang
>>>
>>> Xiaoqiao He <hexiaoq...@apache.org> 于2025年2月11日周二 11:15写道：
>>>
>>> > Hi Jian Zhang, Thanks for your great work. Please fix the conflict
>>> first,
>>> > others make sense to me.
>>> > I will give my +1 once it is ready.
>>> > Another thing, before check in we need to launch another official vote
>>> > thread. Good luck.
>>> >
>>> > BTW, Happy lunar new year!
>>> >
>>> > Best Regards,
>>> > - He Xiaoqiao
>>> >
>>> > On Thu, Feb 6, 2025 at 5:30 PM jian zhang <keeprom...@apache.org>
>>> wrote:
>>> >
>>> >> Hi, all,
>>> >> Currently this feature has been developed and passed the pipeline.
>>> Please
>>> >> continue to help review this feature.
>>> >>
>>> >> Best Regards,
>>> >> Jian Zhang
>>> >>
>>> >> Zhanghaobo <hfutzhan...@163.com> 于2025年1月22日周三 18:22写道：
>>> >>
>>> >>> @Hui Fei  Hi, Sir:
>>> >>>   For the first opinion, I have create an umbrella JIRA
>>> >>> https://issues.apache.org/jira/browse/HDFS-17716
>>> >>> and move non-core JIRA under it.
>>> >>>
>>> >>> Best Wishes
>>> >>> Haobo Zhang
>>> >>>
>>> >>> ---- Replied Message ----
>>> >>> From Hui Fei<feihui.u...@gmail.com> <feihui.u...@gmail.com>
>>> >>> Date 01/22/2025 17:37
>>> >>> To jian zhang<keeprom...@apache.org> <keeprom...@apache.org>
>>> >>> Cc Hdfs-dev<hdfs-dev@hadoop.apache.org> ,
>>> >>> <hdfs-dev@hadoop.apache.org> <priv...@hadoop.apache.org> ,
>>> >>> <priv...@hadoop.apache.org> Xiaoqiao He<hexiaoq...@apache.org> ,
>>> >>> <hexiaoq...@apache.org> <common-...@hadoop.apache.org>
>>> >>> <common-...@hadoop.apache.org>
>>> >>> Subject Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk.
>>> >>> Got your idea. Thank you!
>>> >>> - How about removing unfinished tasks and placing them under a new
>>> task
>>> >>> as
>>> >>> subtasks, like ARR improvements? If this feature is completed but
>>> there
>>> >>> are
>>> >>> still some open tasks, it looks strange.
>>> >>> - Will it take a long time to add documentation? Discussion may last
>>> for
>>> >>> several days. If it takes a long time, I think it may block the trunk
>>> >>> release and All community members need to remember that there is
>>> >>> documentation required. It doesn't look good. That's my thought, and
>>> we
>>> >>> can
>>> >>> wait for others' opinions
>>> >>>
>>> >>> jian zhang <keeprom...@apache.org> 于2025年1月22日周三 16:53写道：
>>> >>>
>>> >>> Hi, Hui Fei,
>>> >>> - The remaining 3 sub tasks are not related to the core functions of
>>> >>> the asynchronous router, and these sub tasks have little impact on
>>> the
>>> >>> trunk branch, we can wait until HDFS-17531 is merged into the trunk,
>>> and
>>> >>> then submit the remaining PRs directly to the trunk.
>>> >>> - It is indeed necessary to add a documentation to
>>> >>> "HDFSRouterFederation.md", how about submitting a PR to do this after
>>> >>> merging HDFS-17531 into the trunk branch?
>>> >>>
>>> >>> Best Regards,
>>> >>> Jian Zhang
>>> >>>
>>> >>> Hui Fei <feihui.u...@gmail.com> 于2025年1月22日周三 16:24写道：
>>> >>>
>>> >>> Thanks for your great work, looking forward to this feature.
>>> >>>
>>> >>> Some comments from me.
>>> >>> - I checked and found that there are still 3 sub tasks under this
>>> >>> feature jira ticket, are they necessary to be solved?
>>> >>> - I didn't find the documentation for this feature. It's a key
>>> feature,
>>> >>> Is it necessary to add documentation to HDFSRouterFederation.md?
>>> >>>
>>> >>> jian zhang <zjkeeprom...@gmail.com> 于2025年1月22日周三 10:29写道：
>>> >>>
>>> >>> Hi, all, the development of the asynchronous router functionality has
>>> >>> been completed. The development branch is HDFS-17531, and it is
>>> ready to
>>> >>> be
>>> >>> merged into the trunk branch.
>>> >>>
>>> >>> JIRA: HDFS-17531 https://issues.apache.org/jira/browse/HDFS-17531
>>> >>> PR: https://github.com/apache/hadoop/pull/7308
>>> >>>
>>> >>> Here is the functionality introduction of the asynchronous router for
>>> >>> everyone to review:
>>> >>> I. Overview
>>> >>>
>>> >>> The asynchronous router aims to address the performance bottleneck
>>> >>> issues of the synchronous router in high - concurrency and multi -
>>> >>> nameservices scenarios. By introducing an asynchronous processing
>>> >>> mechanism, it optimizes the request handling process, improves the
>>> >>> system's
>>> >>> concurrency ability and resource utilization, and is particularly
>>> >>> suitable
>>> >>> for the federated scenarios where multiple downstream services (NS)
>>> need
>>> >>> to
>>> >>> be processed.
>>> >>>
>>> >>> II. Problems of the Synchronous Router
>>> >>>
>>> >>> - Performance Bottleneck: The performance of the synchronous router
>>> >>> is limited by the number of handler threads. Even if the connection
>>> >>> thread
>>> >>> can still forward requests to the downstream namenode, the handler
>>> must
>>> >>> wait for each request to complete before processing the next one,
>>> >>> resulting
>>> >>> in limited processing capacity.
>>> >>> - Thread Resource Waste: To improve performance, increasing the
>>> >>> number of handler threads will lead to more thread switches, which
>>> >>> instead
>>> >>> reduces the system efficiency. At the same time, a large number of
>>> >>> handler
>>> >>> threads are in a blocked state, wasting thread resources.
>>> >>> - Poor Isolation in Multi - ns: If the performance of a certain
>>> >>> nameservice in the downstream nameservice is poor, it will cause the
>>> >>> handler to wait for a long time, thus affecting the forwarding of
>>> >>> requests
>>> >>> to other normal - performance ns, resulting in a decrease in the
>>> overall
>>> >>> performance of the downstream ns services perceived by the client.
>>> >>> - Ineffective Utilization of Federation Multi - ns Performance: In
>>> >>> high - concurrency scenarios, a large number of requests may be
>>> >>> backlogged
>>> >>> in the router's request queue, while the queues of downstream
>>> services
>>> >>> are
>>> >>> not fully utilized, leading to unreasonable resource allocation.
>>> >>>
>>> >>> III. Design and Improvements of the Asynchronous Router
>>> >>>
>>> >>> The asynchronous router solves the above problems by redesigning the
>>> >>> request handling process and introducing an asynchronous processing
>>> >>> mechanism. Its core improvements include:
>>> >>>
>>> >>> - Handler: Retrieves requests from the request queue for preliminary
>>> >>> processing. If there are exceptions in the request (such as the mount
>>> >>> point
>>> >>> does not exist, etc.), it directly puts the response into the
>>> response
>>> >>> queue; otherwise, it sends the request to the asynchronous handler
>>> thread
>>> >>> pool.
>>> >>> - Async Handler: Puts the request into the call queue
>>> >>> (connection.calls) of the connection thread and returns immediately
>>> >>> without
>>> >>> blocking and waiting.
>>> >>> - Async Responder: Is responsible for processing the responses
>>> >>> received by the connection thread. If the request needs to be re -
>>> >>> initiated (such as the downstream service returns a standby
>>> exception),
>>> >>> it
>>> >>> re - adds the request to the asynchronous handler thread pool;
>>> otherwise,
>>> >>> it puts the response into the response queue.
>>> >>> - Responder: Retrieves the response from the response queue and
>>> >>> returns it to the client.
>>> >>>
>>> >>> IV. Advantages of the Asynchronous Router
>>> >>>
>>> >>> - High - Concurrency Performance: Through the asynchronous
>>> >>> processing mechanism, the asynchronous router can handle a large
>>> number
>>> >>> of
>>> >>> requests simultaneously, significantly improving the system's
>>> concurrent
>>> >>> processing ability.
>>> >>> - High Resource Utilization: It avoids thread blocking and frequent
>>> >>> switching, reduces thread resource waste, and improves the overall
>>> >>> efficiency of the system.
>>> >>> - Isolation: Different ns are processed by different async handler
>>> >>> thread pools, achieving isolation of different downstream services.
>>> Even
>>> >>> if
>>> >>> the performance of a certain service is poor, it will not affect the
>>> >>> processing ability of other services.
>>> >>>
>>> >>> V. Summary
>>> >>>
>>> >>> The asynchronous router solves the performance bottleneck problem of
>>> >>> the traditional synchronous router in high - concurrency scenarios by
>>> >>> introducing an asynchronous processing mechanism. It not only
>>> improves
>>> >>> the
>>> >>> system's concurrency ability and resource utilization but also
>>> achieves
>>> >>> isolation of downstream services through the queue mechanism,
>>> enhancing
>>> >>> the
>>> >>> system's stability and adaptability. In the federated scenarios where
>>> >>> multiple downstream services need to be processed, the asynchronous
>>> >>> router
>>> >>> is a more efficient and reliable solution.
>>> >>> VI. Performance Testing
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> https://docs.google.com/document/d/1meHOCvhm3XRHlIMwvKFidfUSjveTJrb8yAMasrM_HrY/edit?tab=t.0#heading=h.du0zlo2k5sb1
>>> >>>
>>> >>> VII. JIRA & RPs
>>> >>>
>>> >>> For more information, please refer to JIRA:
>>> >>> JIRA: RBF: Asynchronous router RPC:
>>> >>> https://issues.apache.org/jira/browse/HDFS-17531
>>> >>> PRs:
>>> >>> HDFS-17543. [ARR] AsyncUtil makes asynchronous code more concise and
>>> >>> easier.
>>> >>> HADOOP-19235. IPC client uses CompletableFuture to support
>>> >>> asynchronous operations.
>>> >>> HDFS-17544. [ARR] The router client rpc protocol PB supports
>>> >>> asynchrony.
>>> >>> HDFS-17545. [ARR] router async rpc client.
>>> >>> HDFS-17594. [ARR] RouterCacheAdmin supports asynchronous rpc.
>>> >>> HDFS-17597. [ARR] RouterSnapshot supports asynchronous rpc.
>>> >>> HDFS-17595. [ARR] ErasureCoding supports asynchronous rpc.
>>> >>> HDFS-17601. [ARR] RouterRpcServer supports asynchronous rpc.
>>> >>> HDFS-17596. [ARR] RouterStoragePolicy supports asynchronous rpc.
>>> >>> HDFS-17656. [ARR] RouterNamenodeProtocol and RouterUserProtocol
>>> >>> supports asynchronous rpc.
>>> >>> HDFS-17659. [ARR]Router Quota supports asynchronous rpc.
>>> >>> HDFS-17672. [ARR] Move asynchronous related classes to the async
>>> >>> package.
>>> >>> HADOOP-19361. RPC DeferredMetrics bugfix.
>>> >>> HDFS-17640.[ARR] RouterClientProtocol supports asynchronous rpc.
>>> >>> HDFS-17650. [ARR] The router server-side rpc protocol PB supports
>>> >>> asynchrony.
>>> >>> HDFS-17651.[ARR] Async handler executor isolation.
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>>> >>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>>> >>>
>>> >>>
>>> >>>
>>>
>>

Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk.

Reply via email to