Hi, Hui Fei,

Thank you for your suggestion. It won’t take much time to add a document. I
can add a subtask to do this recently. You can focus on the main function
first. After I complete the document, I will add a PR for everyone to
review.

Best Regards,
Jian Zhang

Hui Fei <feihui.u...@gmail.com> 于2025年1月22日周三 17:39写道:

> Got your idea. Thank you!
> - How about removing unfinished tasks and placing them under a new task as
> subtasks, like ARR improvements? If this feature is completed but there are
> still some open tasks, it looks strange.
> - Will it take a long time to add documentation? Discussion may last for
> several days. If it takes a long time, I think it may block the trunk
> release and All community members need to remember that there is
> documentation required. It doesn't look good. That's my thought, and we can
> wait for others' opinions
>
> jian zhang <keeprom...@apache.org> 于2025年1月22日周三 16:53写道:
>
>> Hi, Hui Fei,
>>     - The remaining 3 sub tasks are not related to the core functions of
>> the asynchronous router, and these sub tasks have little impact on the
>> trunk branch, we can wait until HDFS-17531 is merged into the trunk, and
>> then submit the remaining PRs directly to the trunk.
>>     - It is indeed necessary to add a documentation to
>> "HDFSRouterFederation.md", how about submitting a PR to do this after
>> merging HDFS-17531 into the trunk branch?
>>
>> Best Regards,
>> Jian Zhang
>>
>> Hui Fei <feihui.u...@gmail.com> 于2025年1月22日周三 16:24写道:
>>
>>> Thanks for your great work, looking forward to this feature.
>>>
>>> Some comments from me.
>>>  - I checked and found that there are still 3 sub tasks under this
>>> feature jira ticket, are they necessary to be solved?
>>>  - I didn't find the documentation for this feature. It's a key feature,
>>> Is it necessary to add documentation to HDFSRouterFederation.md?
>>>
>>> jian zhang <zjkeeprom...@gmail.com> 于2025年1月22日周三 10:29写道:
>>>
>>>> Hi, all, the development of the asynchronous router functionality has
>>>> been completed. The development branch is HDFS-17531, and it is ready to be
>>>> merged into the trunk branch.
>>>>
>>>> JIRA: HDFS-17531 https://issues.apache.org/jira/browse/HDFS-17531
>>>> PR: https://github.com/apache/hadoop/pull/7308
>>>>
>>>> Here is the functionality introduction of the asynchronous router for
>>>> everyone to review:
>>>> I. Overview
>>>>
>>>>     The asynchronous router aims to address the performance bottleneck
>>>> issues of the synchronous router in high - concurrency and multi -
>>>> nameservices scenarios. By introducing an asynchronous processing
>>>> mechanism, it optimizes the request handling process, improves the system's
>>>> concurrency ability and resource utilization, and is particularly suitable
>>>> for the federated scenarios where multiple downstream services (NS) need to
>>>> be processed.
>>>>
>>>> II. Problems of the Synchronous Router
>>>>
>>>>     - Performance Bottleneck: The performance of the synchronous router
>>>> is limited by the number of handler threads. Even if the connection thread
>>>> can still forward requests to the downstream namenode, the handler must
>>>> wait for each request to complete before processing the next one, resulting
>>>> in limited processing capacity.
>>>>     - Thread Resource Waste: To improve performance, increasing the
>>>> number of handler threads will lead to more thread switches, which instead
>>>> reduces the system efficiency. At the same time, a large number of handler
>>>> threads are in a blocked state, wasting thread resources.
>>>>     - Poor Isolation in Multi - ns: If the performance of a certain
>>>> nameservice in the downstream nameservice is poor, it will cause the
>>>> handler to wait for a long time, thus affecting the forwarding of requests
>>>> to other normal - performance ns, resulting in a decrease in the overall
>>>> performance of the downstream ns services perceived by the client.
>>>>     - Ineffective Utilization of Federation Multi - ns Performance: In
>>>> high - concurrency scenarios, a large number of requests may be backlogged
>>>> in the router's request queue, while the queues of downstream services are
>>>> not fully utilized, leading to unreasonable resource allocation.
>>>>
>>>> III. Design and Improvements of the Asynchronous Router
>>>>
>>>>     The asynchronous router solves the above problems by redesigning
>>>> the request handling process and introducing an asynchronous processing
>>>> mechanism. Its core improvements include:
>>>>
>>>>     - Handler: Retrieves requests from the request queue for
>>>> preliminary processing. If there are exceptions in the request (such as the
>>>> mount point does not exist, etc.), it directly puts the response into the
>>>> response queue; otherwise, it sends the request to the asynchronous handler
>>>> thread pool.
>>>>     - Async Handler: Puts the request into the call queue
>>>> (connection.calls) of the connection thread and returns immediately without
>>>> blocking and waiting.
>>>>     - Async Responder: Is responsible for processing the responses
>>>> received by the connection thread. If the request needs to be re -
>>>> initiated (such as the downstream service returns a standby exception), it
>>>> re - adds the request to the asynchronous handler thread pool; otherwise,
>>>> it puts the response into the response queue.
>>>>     - Responder: Retrieves the response from the response queue and
>>>> returns it to the client.
>>>>
>>>> IV. Advantages of the Asynchronous Router
>>>>
>>>>     - High - Concurrency Performance: Through the asynchronous
>>>> processing mechanism, the asynchronous router can handle a large number of
>>>> requests simultaneously, significantly improving the system's concurrent
>>>> processing ability.
>>>>     - High Resource Utilization: It avoids thread blocking and frequent
>>>> switching, reduces thread resource waste, and improves the overall
>>>> efficiency of the system.
>>>>     - Isolation: Different ns are processed by different async handler
>>>> thread pools, achieving isolation of different downstream services. Even if
>>>> the performance of a certain service is poor, it will not affect the
>>>> processing ability of other services.
>>>>
>>>> V. Summary
>>>>
>>>>     The asynchronous router solves the performance bottleneck problem
>>>> of the traditional synchronous router in high - concurrency scenarios by
>>>> introducing an asynchronous processing mechanism. It not only improves the
>>>> system's concurrency ability and resource utilization but also achieves
>>>> isolation of downstream services through the queue mechanism, enhancing the
>>>> system's stability and adaptability. In the federated scenarios where
>>>> multiple downstream services need to be processed, the asynchronous router
>>>> is a more efficient and reliable solution.
>>>> VI. Performance Testing
>>>>
>>>>
>>>> https://docs.google.com/document/d/1meHOCvhm3XRHlIMwvKFidfUSjveTJrb8yAMasrM_HrY/edit?tab=t.0#heading=h.du0zlo2k5sb1
>>>>
>>>> VII. JIRA & RPs
>>>>
>>>>     For more information, please refer to JIRA:
>>>>     JIRA: RBF: Asynchronous router RPC:
>>>> https://issues.apache.org/jira/browse/HDFS-17531
>>>>     PRs:
>>>>     HDFS-17543. [ARR] AsyncUtil makes asynchronous code more concise
>>>> and easier.
>>>>     HADOOP-19235. IPC client uses CompletableFuture to support
>>>> asynchronous operations.
>>>>     HDFS-17544. [ARR] The router client rpc protocol PB supports
>>>> asynchrony.
>>>>     HDFS-17545. [ARR] router async rpc client.
>>>>     HDFS-17594. [ARR] RouterCacheAdmin supports asynchronous rpc.
>>>>     HDFS-17597. [ARR] RouterSnapshot supports asynchronous rpc.
>>>>     HDFS-17595. [ARR] ErasureCoding supports asynchronous rpc.
>>>>     HDFS-17601. [ARR] RouterRpcServer supports asynchronous rpc.
>>>>     HDFS-17596. [ARR] RouterStoragePolicy supports asynchronous rpc.
>>>>     HDFS-17656. [ARR] RouterNamenodeProtocol and RouterUserProtocol
>>>> supports asynchronous rpc.
>>>>     HDFS-17659. [ARR]Router Quota supports asynchronous rpc.
>>>>     HDFS-17672. [ARR] Move asynchronous related classes to the async
>>>> package.
>>>>     HADOOP-19361. RPC DeferredMetrics bugfix.
>>>>     HDFS-17640.[ARR] RouterClientProtocol supports asynchronous rpc.
>>>>     HDFS-17650. [ARR] The router server-side rpc protocol PB supports
>>>> asynchrony.
>>>>     HDFS-17651.[ARR] Async handler executor isolation.
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>>>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>>>>
>>>>

Reply via email to