Re: [Discuss] RBF: Aynchronous router RPC.

zhangjian Mon, 20 May 2024 20:21:50 -0700

Thank you for your positive attitude towards this feature. You can debug the 
UTs provided in PR to better understand the current asynchronous calling 
function.


> 2024年5月21日 02:04，Simbarashe Dzinamarira <[email protected]> 写道：
> 
> Excited to see this feature as well. I'll spend more time understanding the
> proposal and implementation.
> 
> On Mon, May 20, 2024 at 7:55 AM zhangjian <[email protected]> wrote:
> 
>> Hi, Yuanbo liu,  thank you for your interest in this feature, I think the
>> difficulty of an asynchronous router is not only to implement asynchronous
>> functions, but also to consider the readability and reusability of the
>> code, so as to facilitate the development of the community. I also planned
>> to do the virtual thread you mentioned at the beginning, virtual Threads
>> can achieve asynchronousization elegantly at the code level, but the
>> biggest problem is that it is not easy to upgrade the jdk version, no
>> matter in the community or in the actual production environment. Therefore,
>> I later used CompletableFuture, which is currently supported by jdk 8, to
>> achieve asynchronousization. The router is stateless, and the router rpc
>> process is very clear. Therefore, even if CompletableFuture itself is not
>> as readable as the virtual thread, if we design it well, we can make the
>> asynchronous process look very clear.
>> 
>> 
>>> 2024年5月20日 10:56，Yuanbo Liu <[email protected]> 写道：
>>> 
>>> Nice to see this feature brought up. I tried to implement this feature in
>>> our internal clusters, and know that it's a very complicated feature, CC
>>> hdfs-dev to bring more discussion.
>>> By the way, I'm not sure whether virtual thread of higher jdk will help
>> in
>>> this case.
>>> 
>>> On Mon, May 20, 2024 at 10:10 AM zhangjian <[email protected]>
>>> wrote:
>>> 
>>>> Hello everyone, currently there are some shortcomings in the RPC of HDFS
>>>> router：
>>>> 
>>>> Currently the router's handler thread is synchronized, when the
>> *handler* thread
>>>> adds the call to connection.calls, it needs to wait until the
>> *connection* notifies
>>>> the call to complete, and then Only after the response is put into the
>>>> response queue can a new call be obtained from the call queue and
>>>> processed. Therefore, the concurrency performance of the router is
>> limited
>>>> by the number of handlers; a simple example is as follows: If the
>> number of
>>>> handlers is 1 and the maximum number of calls in the connection thread
>> is
>>>> 10, then even if the connection thread can send 10 requests to the
>>>> downstream ns, since the number of handlers is 1, the router can only
>>>> process one request after another.
>>>> 
>>>> Since the performance of router rpc is mainly limited by the number of
>>>> handlers, the most effective way to improve rpc performance currently
>> is to
>>>> increase the number of handlers. Letting the router create a large
>> number
>>>> of handler threads will also increase the number of thread switches and
>>>> cannot maximize the use of machine performance.
>>>> 
>>>> There are usually multiple ns downstream of the router. If the handler
>>>> forwards the request to an ns with poor performance, it will cause the
>>>> handler to wait for a long time. Due to the reduction of available
>>>> handlers, the router's ability to handle ns requests with normal
>>>> performance will be reduced. From the perspective of the client, the
>>>> performance of the downstream ns of the router has deteriorated at this
>>>> time. We often find that the call queue of the downstream ns is not
>> high,
>>>> but the call queue of the router is very high.
>>>> 
>>>> Therefore, although the main function of the router is to federate and
>>>> handle requests from multiple NSs, the current synchronous RPC
>> performance
>>>> cannot satisfy the scenario where there are many NSs downstream of the
>>>> router. Even if the concurrent performance of the router can be
>> improved by
>>>> increasing the number of handlers, it is still relatively slow. More
>>>> threads will increase the CPU context switching time, and in fact many
>> of
>>>> the handler threads are in a blocked state, which is undoubtedly a
>> waste of
>>>> thread resources. When a request enters the router, there is no
>> guarantee
>>>> that there will be a running handler at this time.
>>>> 
>>>> 
>>>> Therefore, I consider asynchronous router rpc. Please view the issues:
>>>> https://issues.apache.org/jira/browse/HDFS-17531  for the complete
>>>> solution.
>>>> 
>>>> And you can also view this PR:
>> https://github.com/apache/hadoop/pull/6838,
>>>> which is just a demo, but it completes the core asynchronous RPC
>> function.
>>>> If you think asynchronous routing is feasible, we can consider splitting
>>>> this PR for easy review in the future.
>>>> 
>>>> The PDF is attached and can also be viewed through issues.
>>>> 
>>>> Welcome everyone to exchange and discuss!
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [Discuss] RBF: Aynchronous router RPC.

Reply via email to