Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk.

jian zhang Wed, 12 Feb 2025 18:15:39 -0800

Hi, He Xiaoqiao

I have rebased HDFS-17531 again and resolved the conflicts. The current
pipeline failure is unrelated to the ARR feature and was introduced by
slfan1989's PR: HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in
hadoop-common Part 1. (#7339). slfan1989 will fix it later.


Best Regards,
- Jian Zhang

Xiaoqiao He <[email protected]> 于2025年2月11日周二 11:15写道：

> Hi Jian Zhang, Thanks for your great work. Please fix the conflict first,
> others make sense to me.
> I will give my +1 once it is ready.
> Another thing, before check in we need to launch another official vote
> thread. Good luck.
>
> BTW, Happy lunar new year!
>
> Best Regards,
> - He Xiaoqiao
>
> On Thu, Feb 6, 2025 at 5:30 PM jian zhang <[email protected]> wrote:
>
>> Hi, all,
>> Currently this feature has been developed and passed the pipeline. Please
>> continue to help review this feature.
>>
>> Best Regards,
>> Jian Zhang
>>
>> Zhanghaobo <[email protected]> 于2025年1月22日周三 18:22写道：
>>
>>> @Hui Fei  Hi, Sir:
>>>   For the first opinion, I have create an umbrella JIRA
>>> https://issues.apache.org/jira/browse/HDFS-17716
>>> and move non-core JIRA under it.
>>>
>>> Best Wishes
>>> Haobo Zhang
>>>
>>> ---- Replied Message ----
>>> From Hui Fei<[email protected]> <[email protected]>
>>> Date 01/22/2025 17:37
>>> To jian zhang<[email protected]> <[email protected]>
>>> Cc Hdfs-dev<[email protected]> ,
>>> <[email protected]> <[email protected]> ,
>>> <[email protected]> Xiaoqiao He<[email protected]> ,
>>> <[email protected]> <[email protected]>
>>> <[email protected]>
>>> Subject Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk.
>>> Got your idea. Thank you!
>>> - How about removing unfinished tasks and placing them under a new task
>>> as
>>> subtasks, like ARR improvements? If this feature is completed but there
>>> are
>>> still some open tasks, it looks strange.
>>> - Will it take a long time to add documentation? Discussion may last for
>>> several days. If it takes a long time, I think it may block the trunk
>>> release and All community members need to remember that there is
>>> documentation required. It doesn't look good. That's my thought, and we
>>> can
>>> wait for others' opinions
>>>
>>> jian zhang <[email protected]> 于2025年1月22日周三 16:53写道：
>>>
>>> Hi, Hui Fei,
>>> - The remaining 3 sub tasks are not related to the core functions of
>>> the asynchronous router, and these sub tasks have little impact on the
>>> trunk branch, we can wait until HDFS-17531 is merged into the trunk, and
>>> then submit the remaining PRs directly to the trunk.
>>> - It is indeed necessary to add a documentation to
>>> "HDFSRouterFederation.md", how about submitting a PR to do this after
>>> merging HDFS-17531 into the trunk branch?
>>>
>>> Best Regards,
>>> Jian Zhang
>>>
>>> Hui Fei <[email protected]> 于2025年1月22日周三 16:24写道：
>>>
>>> Thanks for your great work, looking forward to this feature.
>>>
>>> Some comments from me.
>>> - I checked and found that there are still 3 sub tasks under this
>>> feature jira ticket, are they necessary to be solved?
>>> - I didn't find the documentation for this feature. It's a key feature,
>>> Is it necessary to add documentation to HDFSRouterFederation.md?
>>>
>>> jian zhang <[email protected]> 于2025年1月22日周三 10:29写道：
>>>
>>> Hi, all, the development of the asynchronous router functionality has
>>> been completed. The development branch is HDFS-17531, and it is ready to
>>> be
>>> merged into the trunk branch.
>>>
>>> JIRA: HDFS-17531 https://issues.apache.org/jira/browse/HDFS-17531
>>> PR: https://github.com/apache/hadoop/pull/7308
>>>
>>> Here is the functionality introduction of the asynchronous router for
>>> everyone to review:
>>> I. Overview
>>>
>>> The asynchronous router aims to address the performance bottleneck
>>> issues of the synchronous router in high - concurrency and multi -
>>> nameservices scenarios. By introducing an asynchronous processing
>>> mechanism, it optimizes the request handling process, improves the
>>> system's
>>> concurrency ability and resource utilization, and is particularly
>>> suitable
>>> for the federated scenarios where multiple downstream services (NS) need
>>> to
>>> be processed.
>>>
>>> II. Problems of the Synchronous Router
>>>
>>> - Performance Bottleneck: The performance of the synchronous router
>>> is limited by the number of handler threads. Even if the connection
>>> thread
>>> can still forward requests to the downstream namenode, the handler must
>>> wait for each request to complete before processing the next one,
>>> resulting
>>> in limited processing capacity.
>>> - Thread Resource Waste: To improve performance, increasing the
>>> number of handler threads will lead to more thread switches, which
>>> instead
>>> reduces the system efficiency. At the same time, a large number of
>>> handler
>>> threads are in a blocked state, wasting thread resources.
>>> - Poor Isolation in Multi - ns: If the performance of a certain
>>> nameservice in the downstream nameservice is poor, it will cause the
>>> handler to wait for a long time, thus affecting the forwarding of
>>> requests
>>> to other normal - performance ns, resulting in a decrease in the overall
>>> performance of the downstream ns services perceived by the client.
>>> - Ineffective Utilization of Federation Multi - ns Performance: In
>>> high - concurrency scenarios, a large number of requests may be
>>> backlogged
>>> in the router's request queue, while the queues of downstream services
>>> are
>>> not fully utilized, leading to unreasonable resource allocation.
>>>
>>> III. Design and Improvements of the Asynchronous Router
>>>
>>> The asynchronous router solves the above problems by redesigning the
>>> request handling process and introducing an asynchronous processing
>>> mechanism. Its core improvements include:
>>>
>>> - Handler: Retrieves requests from the request queue for preliminary
>>> processing. If there are exceptions in the request (such as the mount
>>> point
>>> does not exist, etc.), it directly puts the response into the response
>>> queue; otherwise, it sends the request to the asynchronous handler thread
>>> pool.
>>> - Async Handler: Puts the request into the call queue
>>> (connection.calls) of the connection thread and returns immediately
>>> without
>>> blocking and waiting.
>>> - Async Responder: Is responsible for processing the responses
>>> received by the connection thread. If the request needs to be re -
>>> initiated (such as the downstream service returns a standby exception),
>>> it
>>> re - adds the request to the asynchronous handler thread pool; otherwise,
>>> it puts the response into the response queue.
>>> - Responder: Retrieves the response from the response queue and
>>> returns it to the client.
>>>
>>> IV. Advantages of the Asynchronous Router
>>>
>>> - High - Concurrency Performance: Through the asynchronous
>>> processing mechanism, the asynchronous router can handle a large number
>>> of
>>> requests simultaneously, significantly improving the system's concurrent
>>> processing ability.
>>> - High Resource Utilization: It avoids thread blocking and frequent
>>> switching, reduces thread resource waste, and improves the overall
>>> efficiency of the system.
>>> - Isolation: Different ns are processed by different async handler
>>> thread pools, achieving isolation of different downstream services. Even
>>> if
>>> the performance of a certain service is poor, it will not affect the
>>> processing ability of other services.
>>>
>>> V. Summary
>>>
>>> The asynchronous router solves the performance bottleneck problem of
>>> the traditional synchronous router in high - concurrency scenarios by
>>> introducing an asynchronous processing mechanism. It not only improves
>>> the
>>> system's concurrency ability and resource utilization but also achieves
>>> isolation of downstream services through the queue mechanism, enhancing
>>> the
>>> system's stability and adaptability. In the federated scenarios where
>>> multiple downstream services need to be processed, the asynchronous
>>> router
>>> is a more efficient and reliable solution.
>>> VI. Performance Testing
>>>
>>>
>>>
>>> https://docs.google.com/document/d/1meHOCvhm3XRHlIMwvKFidfUSjveTJrb8yAMasrM_HrY/edit?tab=t.0#heading=h.du0zlo2k5sb1
>>>
>>> VII. JIRA & RPs
>>>
>>> For more information, please refer to JIRA:
>>> JIRA: RBF: Asynchronous router RPC:
>>> https://issues.apache.org/jira/browse/HDFS-17531
>>> PRs:
>>> HDFS-17543. [ARR] AsyncUtil makes asynchronous code more concise and
>>> easier.
>>> HADOOP-19235. IPC client uses CompletableFuture to support
>>> asynchronous operations.
>>> HDFS-17544. [ARR] The router client rpc protocol PB supports
>>> asynchrony.
>>> HDFS-17545. [ARR] router async rpc client.
>>> HDFS-17594. [ARR] RouterCacheAdmin supports asynchronous rpc.
>>> HDFS-17597. [ARR] RouterSnapshot supports asynchronous rpc.
>>> HDFS-17595. [ARR] ErasureCoding supports asynchronous rpc.
>>> HDFS-17601. [ARR] RouterRpcServer supports asynchronous rpc.
>>> HDFS-17596. [ARR] RouterStoragePolicy supports asynchronous rpc.
>>> HDFS-17656. [ARR] RouterNamenodeProtocol and RouterUserProtocol
>>> supports asynchronous rpc.
>>> HDFS-17659. [ARR]Router Quota supports asynchronous rpc.
>>> HDFS-17672. [ARR] Move asynchronous related classes to the async
>>> package.
>>> HADOOP-19361. RPC DeferredMetrics bugfix.
>>> HDFS-17640.[ARR] RouterClientProtocol supports asynchronous rpc.
>>> HDFS-17650. [ARR] The router server-side rpc protocol PB supports
>>> asynchrony.
>>> HDFS-17651.[ARR] Async handler executor isolation.
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>>

Re: [DISCUSS] Request to merge branch HDFS-17531 into trunk.

Reply via email to